Made with <3 by Erin
$ pip install language-pipes

The idea

P2P inference for open-source LLMs

Language models run their input through a long stack of transformer layers. Language Pipes cuts that stack into segments and hands each segment to a different machine, so the memory cost is shared across a network you control. No single node needs to hold the whole model and no central server sits between your nodes. It's peer-to-peer, decentralized, and Python-native.

How it works

Layer models and the End model

Inference flows through a pipeline. The End model keeps the text-handling stages (tokenization, embedding, final norm and the output head) on one trusted node. The transformer layers in between are distributed across Layer models on other machines, which only ever see continuous-valued hidden-state tensors.

TOKENIZE End model
EMBED End model
LAYERS ×N Layer models
NORM End model
HEAD End model
End model — text stays here (one trusted node) Layer models — opaque tensors only (distributed) ⇢ hidden-state tensor sent over the network

Each layer performs matrix multiplications between learned weights and a hidden-state tensor, then passes the result down the pipe. Splitting where the layers are hosted shares the memory cost and keeps text off every node but the one making the request.
See the job processor state machine →
Understand the threat model →

Quick start

Example

Distribute Qwen/Qwen3-1.7B across two machines. Node 1 hosts the End Model, so prompts and responses stay on Node 1, plus enough layers to fit its memory. Node 2 hosts the rest. Launch the TUI with language-pipes and configure each node, then call it like any OpenAI endpoint:

node-1 — client.py
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",  # node-1 IP + job port
    api_key="not-needed",
)

resp = client.chat.completions.create(
    model="Qwen/Qwen3-1.7B",
    messages=[{"role": "user",
               "content": "Write a haiku about distributed systems."}],
)
print(resp.choices[0].message.content)

Support

Supported models

Model families today

  • Qwen3
  • Phi 4
  • Meta Llama 2 and 3
  • Gemma 3 and 4

Fine-tunes of a supported base model should work too.
See all tested models →