Optimizing Latency and Throughput in Java REST APIs
As a Developer, you’re likely working on high-performance backend systems where latency and
throughput are critical. Whether you're handling microservices at scale, tuning APIs for high traffic,
or optimizing database queries, understanding these performance metrics is crucial.
Latency vs Throughput: What’s the Difference?
Metric Definition Measured In Optimized By
Time taken to process a single
Latency Milliseconds (ms) Faster request handling
request
Number of requests handled per
Throughput Requests per second (RPS) Better concurrency and scaling
second
Key Insight: Low latency doesn’t always mean high throughput. A system may process
requests quickly but handle only a few at a time, limiting overall performance.
1: Reducing Latency in Java REST APIs
A slow database query can be the biggest bottleneck.
Use Indexing: [sql]
Eg: CREATE INDEX idx_user_email ON users(email);
Optimize Queries: Avoid N+1 query problems in JPA with @EntityGraph.
[java]
@EntityGraph(attributePaths = {"orders"})
List<Customer> findAll();
Use Connection Pooling: Increase HikariCP max connections in application.yml:
spring.datasource.hikari.maximum-pool-size: 50
Use Caching (Redis, Ehcache, Caffeine):
Cache frequent responses instead of hitting the database every time.
Eg:
@Cacheable(value = "users", key = "#id")
public User getUser(Long id) {
return userRepository.findById(id).orElse(null);
}
Redis: Best for distributed caching.
Caffeine: Best for in-memory caching in microservices.
Reduce JSON Processing Overhead
Large payloads increase serialization/deserialization time.
Use Jackson Annotations to Exclude Unused Fields:
Eg:
@JsonIgnoreProperties({"password", "ssn"})
public class User { ... }
Enable GZIP Compression: .yml
server:
compression:
enabled: true
mime-types: application/json,application/xml
Avoid Blocking I/O (Use WebFlux or Async Processing)
Traditional Spring MVC uses blocking I/O, reducing efficiency under high load.
Use CompletableFuture for Async Execution:
Eg:java
@Async
public CompletableFuture<User> getUser(Long id) {
return CompletableFuture.supplyAsync(() ->
userRepository.findById(id).orElse(null));
}
2. Increasing Throughput in Java REST APIs
Optimize Thread Management:
By default, Spring Boot runs on Tomcat, which has a 200-thread limit. Increase the
thread pool:
.yml
server:
tomcat:
max-threads: 500
Implement Circuit Breakers (Resilience4j):
Protect APIs from downstream failures that could reduce throughput.
Eg:
@CircuitBreaker(name = "userService", fallbackMethod = "fallbackUser")
public User getUser(Long id) {
return userClient.getUser(id);
}
public User fallbackUser(Long id, Throwable t) {
return new User(id, "Fallback User", "N/A");
}
Load Balancing & Auto-Scaling:
A single server limits throughput. Scale horizontally using:
Spring Cloud Load Balancer (for microservices)
Kubernetes Auto-Scaling
NGINX Load Balancing
3. Monitoring & Benchmarking Performance
Use Actuator & Micrometer for Real-time Metrics
Integrate with Prometheus & Grafana for visualization
Load Testing with JMeter or Gatling
Measure requests per second (RPS).
Analyse response time percentiles (P95, P99).
Identify bottlenecks in API calls.
Conclusion: Practical Performance Optimization Strategies
Issue Solution
Slow Queries Use Indexing, Connection Pooling, Caching
Blocking I/O Use WebFlux, Async Processing
Thread Exhaustion Increase Thread Pool, Use Netty
Large Payloads GZIP Compression, Optimize JSON
Downstream Failures Use Circuit Breaker (Resilience4j)
Low Scalability Load Balancing, Kubernetes Auto-scaling
Lack of Monitoring Use Actuator, Prometheus, JMeter
By implementing these strategies, you can build high-performance, scalable, and resilient
Java REST APIs.