Using gRPC Streams for Unary calls

Should we Replace unary calls with a bi-directional stream to improve performance?

This is a hot debate that came up during one of our architectural discussions recently.

Google’s gRPC team advises against it, but nevertheless, few can argue that theoretically, streams should have lower overhead. But that does not seem to be true.

Theoretically, what could make streams perform worse than unary calls?

Streams ensure that messages are delivered in the order that they were sent, this would mean that if there are concurrent messages, there will be some kind of bottleneck.

The gRPC does some smart network optimizations under the hood though. So we still had to know how much the difference was.


I ran some quick benchmarks with existing service and found out that unary and stream had similar latencies when the number of concurrent requests was low. As the number of concurrent requests increases, unary calls perform way better.

About the benchmark

  • A request here is non-idempotent, getting real data (from stock exchanges).
  • Latency in each row is an average of 10 repetitions. When running 1000 concurrent requests, only 2 repetitions were made.
  • The client and server both are based on Google’s .NET library rather than the full .NET version of gRPC client.
  • Also, the client and servers were running in the same network. The results would be a lot different if they were on different networks.


For lower concurrent requests, both have comparable latencies. However, for higher loads, unary calls are much more performant.

There is no apparent reason we should prefer streams over unary, given using streams comes with problems like complex implementation at the application level and poor load balancing as the client will connect with one server and ignore any new servers and lower resilience to network interruptions.

Designer | Craftsman | Consultant