Virtual Thread 성능 테스트 with JDK21 & Spring boot 3.2

안녕하세요. 페이히어 백엔드 개발을 하고 있는 김영남입니다.

자바 환경에서 개발을 한다면, 아마 모르는 분들이 없을 것 같은 ‘Project Loom’ 작년 하반기를 핫하게 달군 ‘Virtual Thread’를 과연 운영 환경에서 사용 할 수 있을지? 궁금하여 테스트를 진행 하였습니다. Spring Boot 기준 3.2버전부터 Virtual Thread를 공식적으로 지원한다는 점 참고 부탁 드립니다.

가상 스레드의 장점은 여기저기 검색하면 정말 많이 나오기에 굳이 설명하지 않겠습니다. 기본적으로 OS Thread를 래핑해서 사용하던 자바로서는 Virtual Thread가 스택 영역 메모리를 잘게 나누어 사용하기에, 더욱 많은 Thread를 발행 할 수 있다는 가장 큰 차이만 알고 있다면 괜찮을 것 같습니다.

이 테스트 조차 살짝 뒷북일 수도 있겠지만, 정확하게 맥락을 짚고 사용해야 하지 않을까 해서

지금 이 테스트를 시작하겠다!

Test Point

동시에 들어온 요청이 사용할 수 있는 스레드(OS)보다 많은 상황을 보는 게 핵심
- 제한된 환경
- 동시성
  - Kotlin Coroutine 과 Java Virtual Thread 의 비교
테스트에서는 실제 상황처럼 I/O 병목을 발생시켜 얼마나 많은 요청을 수월하게 진행하는지 확인
Platform Thread가 요청보다 많은 경우에는 당연히 Virtual Thread를 사용하는 이점이 없습니다.

테스트 구성

jvm option

-XX:ReservedCodeCacheSize=80M
-XX:MaxDirectMemorySize=10M
-Xmx512M
-Xms512M
-XX:MaxMetaspaceSize=150M
-Xss1M
-Djdk.tracePinnedThreads=full
-Djdk.trackAllThreads=true
-XX:StartFlightRecording:settings=my-profile.jfc

Client Request -> Server1 Request -> Server2(sleep 200ms)
Server2는 최대한 많은 요청을 수행할 수 있도록 Tomcat Thread 수를 200으로 지정하고 코루틴을 사용하였습니다.
Server1로 보내는 동시성 요청은 100으로 제한합니다.
Server1의 톰캣 스레드는 제한하기로 합니다.
호출되는 코드는 아래와 같습니다.

..
    return success(ioTest(1) + ioTest(2))
..
private fun ioTest(i: Int): String{
    val responseDTO = restClient.get()
        .uri("http://localhost:8001/api/test-time")
        .accept(MediaType.APPLICATION_JSON)
        .retrieve()
        .body(InternalApiResponseDTO::class.java)

    val data:String = responseDTO!!.data as String

    return data
}

테스트 진행

Virtual Off & Tomcat Thread 20 & Blocking

결과 ( 평균 응답 시간 2136.644 ms )

Concurrency Level:      100
Time taken for tests:   21.366 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      440000 bytes
HTML transferred:       76000 bytes
Requests per second:    46.80 [#/sec] (mean)
Time per request:       2136.644 [ms] (mean)
Time per request:       21.366 [ms] (mean, across all concurrent requests)
Transfer rate:          20.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   4.5      0      34
Processing:   423 1999 314.4   2072    2184
Waiting:      423 1998 314.4   2072    2184
Total:        427 2000 311.1   2073    2185

Percentage of the requests served within a certain time (ms)
  50%   2073
  66%   2089
  75%   2095
  80%   2107
  90%   2125
  95%   2146
  98%   2154
  99%   2161
 100%   2185 (longest request)

Virtual Off & Tomcat Thread 50 & Blocking

결과 ( 평균 응답 시간 903.952 ms )

Concurrency Level:      100
Time taken for tests:   9.040 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      460000 bytes
HTML transferred:       76000 bytes
Requests per second:    110.63 [#/sec] (mean)
Time per request:       903.952 [ms] (mean)
Time per request:       9.040 [ms] (mean, across all concurrent requests)
Transfer rate:          49.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.5      1      10
Processing:   421  815  86.8    832     872
Waiting:      420  814  86.8    831     872
Total:        424  817  86.5    834     873

Percentage of the requests served within a certain time (ms)
  50%    834
  66%    839
  75%    842
  80%    843
  90%    852
  95%    865
  98%    868
  99%    869
 100%    873 (longest request)

Virtual On & Blocking

결과 ( 평균 응답 시간 500.406 ms )

Concurrency Level:      100
Time taken for tests:   5.004 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      460000 bytes
HTML transferred:       76000 bytes
Requests per second:    199.84 [#/sec] (mean)
Time per request:       500.406 [ms] (mean)
Time per request:       5.004 [ms] (mean, across all concurrent requests)
Transfer rate:          89.77 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.0      1       6
Processing:   407  426  21.8    420     725
Waiting:      406  426  21.6    419     712
Total:        407  428  22.2    421     725

Percentage of the requests served within a certain time (ms)
  50%    421
  66%    425
  75%    427
  80%    430
  90%    455
  95%    478
  98%    494
  99%    501
 100%    725 (longest request)

Platform Thread vs Virtual Thread

예상했던 결과이며, Tomcat Thread만 늘려주어도 확실히 응답 속도가 개선됨을 알 수 있습니다. 하지만, Tomcat Thread 는 설정에 따라 다르겠지만, 적게는 0.5MB 많게는 2MB 가량의 메모리를 사용하여 머신의 메모리 크기에 대비에 스레드 발행 수가 제한되기 마련입니다. 반대로 Virtual Thread의 경우 작은 양의 Stack Memory를 사용하기에 많은 양의 Thread를 발행 할 수 있으며, JVM에서 I/O Blocking 구간을 알아서 잡아내어 결과에서 알 수 있듯이 Virtual Thread를 사용하는게 압도적으로 빠릅니다. I/O 발생 시 ForkJoinPool이 Virtual Thread의 작업을 효율적으로 관리하여 주기 때문입니다.

그럼 이제 코루틴과의 비교를 해보겠습니다.

우선 가상스레드 안에서 가상스레드를 만들어 Join을 하는 형태의 코드

val future1 = CompletableFuture.supplyAsync({
    ioTest(1)
}, executorService)
val future2 = CompletableFuture.supplyAsync({
    ioTest(2)
}, executorService)

return success(future1.get() + future2.get())

그리고 webClient를 사용한 Coroutine 코드

suspend fun ioTestCoroutine(): ResponseDTO<String> = withContext(Dispatchers.IO)  {
    val deferred1 = ioTestAwait(1)
    val deferred2 = ioTestAwait(2)
    success(deferred1.await() + deferred2.await())
}
...
private suspend fun ioTestAwait(i: Int): Deferred<String> {
    return CoroutineScope(Dispatchers.IO).async {
        val responseDTO = webClient
            .get()
            .uri("http://localhost:8001/api/test-time")
            .retrieve()
            .bodyToMono(InternalApiResponseDTO::class.java)
            .awaitSingle()
        responseDTO!!.data as String
    }
}

Virtual On & Use Virtual Thread Join

결과 ( 평균 응답 시간 241.797 ms )

Concurrency Level:      100
Time taken for tests:   2.418 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      440000 bytes
HTML transferred:       76000 bytes
Requests per second:    413.57 [#/sec] (mean)
Time per request:       241.797 [ms] (mean)
Time per request:       2.418 [ms] (mean, across all concurrent requests)
Transfer rate:          177.71 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   1.2      1      20
Processing:   202  214  11.3    209     256
Waiting:      202  213  11.2    209     255
Total:        202  215  11.8    211     258

Percentage of the requests served within a certain time (ms)
  50%    211
  66%    213
  75%    217
  80%    222
  90%    231
  95%    237
  98%    253
  99%    256
 100%    258 (longest request)

Virtual Off & WebClient & Use Coroutine

결과 ( 평균 응답 시간 244.610 )

Concurrency Level:      100
Time taken for tests:   2.446 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      440000 bytes
HTML transferred:       76000 bytes
Requests per second:    408.81 [#/sec] (mean)
Time per request:       244.610 [ms] (mean)
Time per request:       2.446 [ms] (mean, across all concurrent requests)
Transfer rate:          175.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   1.1      0       6
Processing:   202  219   8.7    220     246
Waiting:      202  218   8.7    219     246
Total:        202  220   9.1    221     247

Percentage of the requests served within a certain time (ms)
  50%    221
  66%    225
  75%    227
  80%    228
  90%    230
  95%    233
  98%    235
  99%    239
 100%    247 (longest request)

Virtual Thread vs. Coroutine

이것도 어느정도 예상한 결과이긴 합니다. 둘은 체감상 차이가 없으며, 실제 성능은 매 테스트마다 아주 조금씩 달랐습니다.또한 현재 기재된 테스트 요청은 모두 병렬 요청이었는데, 하나의 가상 스레드 내에서 Blocking으로 인해 스레드를 스위칭 하는 성능차이 또한 궁금해서 테스트를 진행했지만, 이것 또한 별반 차이가 없어서 작성하지는 않았습니다.

그래서? 뭐를 어떻게 써야하죠?

Spring MVC에서 적은 노력으로 퍼포먼스를 올리기에는 Virtual Thread가 좋아 보입니다.
- Coroutine의 경우 중단 가능한 suspend 함수를 작성해야 하는데, Virtual Thread는 그러한 노력 없이 알아서 감지가 되니 편합니다.
대체적으로 OpenFeign을 많이 사용하기에 효과적이다 라고 볼 수도 있을것 같습니다.
그리고 I/O 작업에는 Virtual Thread가 유리하다고 하는데, 그런 부분에서는 큰 차이는 모르겠습니다.
주의할점은 Pinned Thread와 관련된 것인데, 이 부분은 꼭 유의해야할 부분이니 충분히 검토를 해야 합니다.