EST. 2021İSTANBUL · MULTIMODAL FOUNDATION MODEL RESEARCH CURRENT FOCUSBULUT-0 PROTOTYPE · ENCODER-ONLY · SINGLE-NODE TRAINING DESIGN PARTNERSOPEN CALL · INDUSTRIAL · MOBILITY · EARTH-OBSERVATION PROGRAMMESAWS ACTIVATE APPLICANT · NVIDIA INCEPTION APPLICANT STATUSPRE-FUNDED · BOOTSTRAPPED · BUILDING IN PUBLIC FROM Q3 2026
Open call · design partners · summer 2026 Read brief →

Foundation models for the physical world.

An İstanbul research lab. We have been working on a single question since 2021: how do you train a model that can reason about the world the way the world actually presents itself — through video, lidar, audio and telemetry, in real time, with provenance you can defend? We are building toward our answer in public, in stages, starting now.

Working since
2021· İstanbul
Stage
Pre-seed· bootstrapped
Modalities
6· video · lidar · audio · IMU · text
Current model
BULUT-0· prototype encoder
Honest about where we are · five years of conviction · zero exaggeration
FOUNDED 2021 Bootstrapped NO INVESTORS YET AWS ACTIVATE APPLICANT NVIDIA Inception applicant RECRUITING DESIGN PARTNERS
§ 01 — Why we exist

Language models know about the world. Ours need to know it.

A robot that has read every Wikipedia page still cannot pour a glass of water. The bottleneck is not text — it is grounded experience. We have spent five years convinced that the next decade of useful intelligence belongs to models trained on the world the way it actually arrives.

The next trillion tokens of useful intelligence are not on the web. They are streaming, in real time, from instruments that nobody has labelled yet.

— a sentence we wrote in a notebook in 2022 and have not stopped believing

What we mean by physical intelligence

A model that, given a short clip of an environment, can predict what will happen ten seconds from now, what an agent should do to achieve a goal, and what would have happened had the agent acted differently. Counterfactual, embodied, real-time.

This is the substrate, we believe, for the next decade of robotics, autonomy, climate science and industrial AI — domains where text alone is insufficient. We are not the only ones with this thesis. We may be among the most patient about it.

§ 02 — What we are building

One model family. Three sizes. Honest milestones.

We do not pretend to have shipped what we have not. The three cards below describe what exists today, what is in active development, and what is on the road map. We will not announce a milestone until we hit it.
BULUT-1 · ROADMAP

The model we are working toward

in development · target preview Q4 2027

Target architecture

ArchitectureSparse MoE Transformer
Modalities (input)video · lidar · audio · imu · text
Modalities (output)future-frame · depth · policy · text
Context windowtarget 32 K · 30 s video
Parameterstarget 14 B (small), 70 B (large)
Training computetarget ~10²³ FLOPs · cloud-bursted
Hardwarecloud H100 / H200 (rented)
First previewQ4 2027 · contingent on funding

Architecture sketch

video tokens lidar voxels audio mels imu / gnss text tokens action codes MoE TRANSFORMER target 32 experts · top-2 routing future video depth · flow action policy language reward
EXISTS · INTERNAL

BULUT-0 · prototype

An encoder-only model trained on a small permissioned dataset on a single GPU. It is what we use to test data pipelines, evaluation harnesses and the parts of stratus we have written so far. Useful for nothing yet — by design.

~120 M PARAMS · INTERNAL ONLY · NOT FOR USE
IN DEVELOPMENT

BULUT-1 Mini · target 2027

Our first public release target. A 14-B distillation we plan to ship under a research licence with open weights once the data engine and evaluation infrastructure are ready. Currently in active development with funding contingencies.

~14 B PARAMS · TARGET Q4 2027 · LICENCE TBD
ROADMAP · 2028+

BULUT-1 · target 2028+

The flagship model. A 70-B-class sparse mixture-of-experts trained on cloud-rented GPU capacity with our partner programmes. Realistic only with proper funding, a meaningful customer base and the infrastructure work between now and then.

~70 B PARAMS · TARGET 2028+ · CONTINGENT
§ 03 — Data engine

Five years thinking about provenance. Five months actually capturing.

A foundation-model lab is, before anything else, a data engineering lab. Below is what the data engine looks like today (small, capture-rig-stage) and what we have committed to writing down before we add any more bytes.

Continuous capture · single rig · İstanbul

One mobile multimodal capture rig running across İstanbul and three rural Anatolian provinces. RGB, lidar, IMU, GNSS and 8-channel audio, time-synced and on-device redacted before any frame leaves the rig. Honest scale: tens of terabytes, not petabytes.

The interesting part is not the volume — it is the provenance ledger we have written around it, which we treat as the most important code in the company.

1 capture rig · 4 cities · ~22 TB · all permissioned

Modality coverage today

RGB · stereo · 32-beam lidar · 8-channel audio · IMU · GNSS. Adding event cameras and 4D radar in 2027 once the calibration pipeline ships v2.

6 modalities · time-sync ≤ 5 ms

Geographic scope today

İstanbul, Konya plateau, two rural Anatolian provinces. Goal for 2027 is twelve cities across two climate zones — only if the design partner programme funds expansion.

4 cities · 1 climate zone · 1 country

Auto-labelling pipeline

Self-supervised pseudo-labelling and cross-modal consistency checks. Today: founder-only QA. Goal: a small expert annotation team funded through customer pilots, not headcount-burn.

QA team: 0 yet · funded via partners

Provenance & consent · the discipline we ship first

Every byte ingested has a cryptographic provenance chain. On-device facial and license-plate redaction runs before transmission. Consent ledgers are stored on a tamper-evident log with row-level lineage to every model checkpoint we train. This is already in production for our own corpus, even though the corpus is small.

GDPR · KVKK aligned · SOC 2 path is on the roadmap

Synthetic counterfactuals · planned

For every real trajectory we will render variants in a Gaussian-splatting twin of the scene to teach causal structure rather than mere correlation. The pipeline exists in prototype; full integration depends on cluster capacity we do not yet own.

Prototype · target activation 2027
§ 04 — Compute

We do not own a cluster yet. We will not pretend otherwise.

Frontier-class training requires frontier-class compute. We are at the rented-cloud stage. Below is what we use today and what the path looks like — including the cloud credit programmes we are actively applying to.

Today · cloud rental

Single-node H100 instances rented from cloud providers when we need them. Most of our work runs on a small on-premise development workstation. We are an active applicant to AWS Activate and the NVIDIA Inception programme, both of which provide compute credits to early-stage startups.

▲ AWS ACTIVATE · APPLICANT 2026 · NVIDIA INCEPTION applicant · seeking design-partner-funded compute · CUDA-native

stratus · our training stack in progress

Our internal training framework — early days, but starting to look like something. Designed from day one to elastically scale across heterogeneous rented GPU pools because we know we will never own the metal we train on at frontier scale. Open-sourcing planned for 2027 once we have something worth the README.

Code today
~14 K LOC
Tested up to
8 GPUs
Open source target
2027
Licence target
Apache 2.0
§ 05 — Where this matters

Four target verticals. Open design-partner conversations.

We are not announcing partnerships we have not signed. Below is where we believe physical-AI foundation models matter most, and where we want our first design partners to come from.
— 01

Autonomous mobility

Closed-loop simulation, edge-case mining and behavioural prediction for fleet operators. We do not build vehicles — we believe the stacks that go into them deserve smarter eyes than a per-task CNN ensemble.

targettruckinglast-milesimulation
— 02

Industrial monitoring

Continuous video understanding of factory floors, ports and energy infrastructure. Anomaly detection, predictive maintenance, OSHA-grade incident reconstruction. The vertical we expect to find our first paying design partner in.

targetmanufacturingportsenergy
— 03

Earth observation & climate

Joint reasoning across satellite, drone and ground-station data — wildfire spread, urban flooding, methane plumes. We have a research relationship in early conversation with one civil-protection group.

targetsatelliteEOclimate
— 04

Robot foundation policies

Generalist manipulation and locomotion policies fine-tuned from BULUT for humanoid, quadruped and arm-robot OEMs. Our most ambitious vertical and the one that benefits most from a foundation-model backbone.

targethumanoidsmanipulationRL
§ 06 — Research notes

Six research lines. Zero published papers — yet.

We have written a great deal in our internal notebooks and published nothing externally. That changes in 2027. Below are the six lines of inquiry our founder-led research team is working on. We will publish what we are confident in, when we are confident.

World models

Latent dynamics, future-frame prediction, counterfactual rollouts. The mathematical core of what we want to build.

First paper target · 2027

Multimodal alignment

Shared representations across video, lidar, audio, IMU and text without paired supervision. Where we are spending most of our time today.

Active · prototype eval harness

Distributed training systems

Mixture-of-experts routing, elastic restarts, fault-tolerant checkpointing across heterogeneous, rented GPU pools.

stratus framework · in progress

Embodied policy learning

Generalist policies for humanoid, quadruped and arm-robot embodiments via fine-tuning from a foundation backbone.

Roadmap · 2028+

Earth observation

Joint training on satellite, drone and ground-station data for nowcasting climate phenomena at sub-km resolution.

Research-collaboration phase

Provenance & alignment

Byte-level audit logs, consent-aware training, model cards as cryptographic artefacts. Already running in production for our small corpus.

Live · the discipline we shipped first
Why a brand-new foundation-model lab in İstanbul · in 2026
CHEAPER ENGINEERING TALENT Geographic data diversity PROVENANCE BY DEFAULT PATIENT FIVE-YEAR THESIS NO LANGUAGE-MODEL DISTRACTION Bridge between EU and emerging markets

Help build the substrate of physical AI with us.

Three conversations are open right now: design partners (industrial / mobility / EO), AWS / NVIDIA / cloud credits, and engineers who like the patient-research-lab archetype. We answer mail in three languages within 48 hours.