Skip to main content

The Bottleneck of Traditional Networks

Traditional distributed AI fails because it tries to send “Tokens” (words) between nodes.
  • In a standard LLM, generating one word requires sending the entire context window across the internet.
  • This creates massive lag. If Node A generates a word, Node B has to wait for it to arrive before it can do anything.

The Breakthrough in Streaming Inference

FAR AI employs an advanced implementation of recent innovations in vectorized inference, which we refer to as Semantic Vector Streaming (SVS). Unlike conventional inference systems that transmit discrete token outputs between nodes, SVS restructures the communication layer to operate on continuous semantic embeddings. Instead of exchanging raw text tokens, nodes transmit high-dimensional vector states that represent the semantic content of multiple tokens at once. FAR’s SVS layer contains a lightweight Semantic Compression Module (SCM) that:
  • Aggregates an input window of k tokens (typically 4-8)
  • Projects that window into a high-coherence latent vector space using a trained linear or low-rank transformation.
  • Emits a single d-dimensional embedding (d << k x vocab_size)
  • Forwards this embedding to downstream nodes, which decompresses it into the model’s expected internal representation.
400% Bandwidth Reduction

By transmitting dense vectors instead of token-level outputs, SVS reduces inter-node communication volume by up to 400%. This drastically lowers the bandwidth required to participate in distributed inference. In practice, a typical home fiber connection (50-100 Mbps upstream) becomes sufficient for contributing meaningful GPU compute, without network saturation or increased inference latency.