Server Platform benchmarks
In this evaluation we’ve benchmarked MetaFi on a number of workloads testing several of its core APIs you’d typically use to build a large scale multiplayer game.
The workloads have been run on a set of different hardware configurations to demonstrate the performance advantages that come from MetaFi’s modern hardware-friendly and highly-scalable architecture.
As the results demonstrate; MetaFi's performance grows as the hardware size grows. Scaling both up and out offers many advantages: from simplified cluster management to access to generally better hardware and economies of scale.
Methodology
The benchmarks were performed using Tsung, a powerful, distributed load testing tool.
The Tsung workloads benchmark MetaFiin single-node deployment (MetaFi OSS) and clustered mode (MetaFi Enterprise) in a few different configurations, using a single database instance.
The database instance hardware was kept constant through out all configurations and workloads to ensure there were no bottlenecks on I/O. Although we’ve also tested some database-bound APIs these benchmarks will focus on the capabilities of MetaFi.
The Tsung servers are run on Google Compute Engine (GCE).
No warmup runs were executed before the actual workloads.
Setup
Tsung / Database
The Tsung topology consists of one master and twenty slave nodes. This setup was unchanged across all the benchmark runs and the hardware specification was:
Instance Type
n1-standard-32
n1-standard-32
dedicated-core vCPU
vCPU / Mem
6 / 8GB
3 / 2GB
8 / 30GB
IOPS (read/write)
-
-
3000
The database was set up on Google CloudSQL.
MetaFi
We’ve run the benchmark workloads against three configurations:
MetaFi OSS
1 Node - 1 CPU / 3GB RAM
MetaFi Enterprise
2 Nodes - 1 CPU / 3GB RAM (per node)
2 Nodes - 2 CPU / 6GB RAM (per node)
All the containers were running on the GCP instance type: “n1-standard-32” and were created on Heroic Cloud platform. The MetaFi nodes are behind a GCP L7 load balancer.
Workloads
The proposed workloads are meant to display MetaFi’s throughput and capacity for effortless production-ready scale.
We’ll present the benchmarking results for the following workloads:
Number of concurrent socket connections (CCU count).
Throughput of new user registration.
Throughput of user authentication.
Throughput of custom RPC call in the Lua runtime.
Throughput of custom RPC call in the Go runtime.
Number of authoritative real-time matches using custom match handlers.
The following subsections are respectively dedicated to each of the aforementioned workloads, where each one of them will be described in more detail; followed by the benchmark results gathered by Tsung for each of the considered hardware and topology configurations.
Results
Workload 1 - Number of concurrent socket connections (CCU count)
This workload consists of authenticating a user, opening a socket connection to MetaFi, and keeping it open for around 200 seconds.
1 Node - 1 CPU / 3GB RAM
Number of connected users
2 Nodes - 1 CPU / 3GB RAM (per node)
Number of connected users
2 Nodes - 2 CPU / 6GB RAM (per node)
Number of connected users
Time to connect
Hardware
Max Connected
highest 10sec mean
lowest 10sec mean
Highest Rate
Mean Rate
Mean
1 Node - 1 CPU / 3GB RAM
20277
21.77 msec
20.69 msec
687.6 / sec
137.25 / sec
21.14 msec
2 Nodes - 1 CPU / 3GB RAM (each)
29550
38.48 msec
21.78 msec
1002.9 / sec
225.98 / sec
23.29 msec
2 Nodes - 2 CPU / 6GB RAM (each)
35723
29.06 msec
21.38 msec
1255.5 / sec
351.54 / sec
23.91 msec
As shown above, a single MetaFi instance with a single CPU core can have up to ~20,000 connected users. Scaling up to 2 nodes with 2 CPU cores each this values goes up to ~35,700 CCU.
Workload 2 - Register a new user
This workload emulates the registration of new users through the game server’s device authentication API which stores the new accounts to the database.
1 Node - 1 CPU / 3GB RAM
Throughput (req/s)
2 Node - 1 CPU / 3GB RAM (per node)
Throughput (req/s)
2 Node - 2 CPU / 6GB RAM (per node)
Throughput (req/s)
request statistics
Hardware
highest 10sec mean
lowest 10sec mean
Highest Rate
Mean Rate
Mean
1 Node - 1 CPU / 3GB RAM
23.28 msec
19.46 msec
906.5 / sec
528.39 / sec
21.24 msec
2 Nodes - 1 CPU / 3GB RAM (each)
27.57 msec
22.57 msec
1295.4 / sec
762.64 / sec
25.75 msec
2 Nodes - 2 CPU / 6GB RAM (each)
170 msec
19.59 msec
1581.5 / sec
939.35 / sec
39.63 msec
As shown above, a single MetaFi server can handle average loads of ~500 requests/sec with requests served in 21.24 ms (mean) with a database write operation for a new user. At this rate a game can create 1.86 million new players every hour. This value goes up to 3.24 million player accounts per hour when scaled to 2 nodes.
Workload 3 - Authenticate a user
This workload consists of authenticating an existing user using the game server’s device authentication API.
1 Node - 1 CPU / 3GB RAM
Throughput (req/s)
2 Node - 1 CPU / 3GB RAM (per node)
Throughput (req/s)
2 Node - 2 CPU / 6GB RAM (per node)
Throughput (req/s)
Hardware
highest 10sec mean
lowest 10sec mean
Highest Rate
Mean Rate
Mean
1 Node - 1 CPU / 3GB RAM
27.21 msec
19.52 msec
921 / sec
531.23 / sec
21.01 msec
2 Nodes - 1 CPU / 3GB RAM (each)
140 msec
22.65 msec
1302.9 / sec
766.25 / sec
27.94 msec
2 Nodes - 2 CPU / 6GB RAM (each)
840 msec
20.34 msec
1649.2 / sec
933.98 / sec
81.53 msec
Workload 4 - Custom Lua RPC call
This workload executes a simple RPC function exposed through the Lua runtime. The function receives a payload as a JSON string, decodes it, and echoes it back to the sender.
1 Node - 1 CPU / 3GB RAM
Throughput (req/s)
2 Node - 1 CPU / 3GB RAM (per node)
Throughput (req/s)
2 Node - 2 CPU / 6GB RAM (per node)
Throughput (req/s)
Request statistics
Hardware
highest 10sec mean
lowest 10sec mean
Highest Rate
Mean Rate
Mean
1 Node - 1 CPU / 3GB RAM
220 msec
20.06 msec
1210.8 / sec
706.78 / sec
33.40 msec
2 Nodes - 1 CPU / 3GB RAM (each)
47.88 msec
19.56 msec
1199.4 / sec
707.00 / sec
23.67 msec
2 Nodes - 2 CPU / 6GB RAM (each)
490 msec
20.80 msec
1406.3 / sec
823.74 / sec
73.13 msec
Workload 5 - Custom JavaScript RPC call
This workload executes a simple RPC function exposed through the JavaScript runtime. The function receives a payload as a JSON string, decodes it, and echoes it back to the sender.
1 Node - 1 CPU / 3GB RAM
Throughput (req/s)
2 Node - 1 CPU / 3GB RAM (per node)
Throughput (req/s)
2 Node - 2 CPU / 6GB RAM (per node)
Throughput (req/s)
Request statistics
Hardware
highest 10sec mean
lowest 10sec mean
Highest Rate
Mean Rate
Mean
1 Node - 1 CPU / 3GB RAM
490 msec
21.51 msec
1201.2 / sec
707.13 / sec
55.88 msec
2 Nodes - 1 CPU / 3GB RAM (each)
120 msec
20.80 msec
1201.3 / sec
705.72 / sec
24.31 msec
2 Nodes - 2 CPU / 6GB RAM (each)
490 msec
21.37 msec
1396.3 / sec
822.15 / sec
72.37 msec
Workload 6 - Custom Go RPC call
This workload executes a simple RPC function exposed through the Go runtime. The function receives a payload as a JSON string, decodes it, and echoes it back to the sender.
1 Node - 1 CPU / 3GB RAM
Throughput (req/s)
2 Node - 1 CPU / 3GB RAM (per node)
Throughput (req/s)
2 Node - 2 CPU / 6GB RAM (per node)
Throughput (req/s)
Request statistics
Hardware
highest 10sec mean
lowest 10sec mean
Highest Rate
Mean Rate
Mean
1 Node - 1 CPU / 3GB RAM
120 msec
19.99 msec
1192.9 / sec
705.20 / sec
27.92 msec
2 Nodes - 1 CPU / 3GB RAM (each)
23.65 msec
19.58 msec
1198.3 / sec
708.17 / sec
21.93 msec
2 Nodes - 2 CPU / 6GB RAM (each)
57.36 msec
18.89 msec
1404.6 / sec
825.40 / sec
24.39 msec
As shown above, a single MetaFi server can handle an average of ~700 requests/sec served in 27.92 msec (mean). When compared with the results of Workload 4 and 5, we see that the results between the Lua, JavaScript and Go runtime are very similar. This is because the benchmarked workload does not incur significant CPU computations; causing the results to be similar despite the differences of the Lua/JavaScript virtual machines. With CPU intensive code the performance results would start to differ as would RAM usage by the Lua/JavaScript runtime.
Workload 7 - Custom authoritative match Logic
This workload emulates a real-time multiplayer game running on MetaFi’s server-authoritative multiplayer engine. Although the client and custom logic are not an actual multiplayer game; the code creates an approximation of a real use-case scenario in terms of messages being exchanged between the server and the connected game clients. We’ll briefly explain the server and client logic in this workload.
Server side logic
The server runs multiplayer matches with a tick rate of 10 ticks per second. Each match can have a maximum of 10 players.
The server implements an RPC call that the client can query to get the ID of an ongoing match (with less than 10 players). When this API is invoked, the server will use the Match Listing feature to look for matches that are not full and return the first result. If no matches were found; a new one is initiated.
The match loop logic is simple; the server expects to receive one of two opcodes from the client and performs either of the following actions:
Echo back the received message to the client.
Broadcast the message to all of the match participants.
Client side logic
The client logic is also simple; each game client performs the following steps in-order:
Authenticates an existing user with MetaFi to receive a token.
Execute the server RPC function to receive an ID of an ongoing match (which is not full).
Establishes a websocket connection with the real-time API.
Join the match with the ID received in step 2.
For 180 seconds the client will loop and each half second will alternate between sending a message with opcode 1 or 2.
The messages sent by the client contain a payload of fixed size with a string of 44 and 35 characters for opcode 1 and 2 respectively.
1 Node - 1 CPU / 3GB RAM
Number of connected users
2 Node - 1 CPU / 3GB RAM (per node)
Number of connected users
2 Node - 2 CPU / 6GB RAM (per node)
Number of connected users
These results are the averages for each request made by the client because this workload involved:
Authentication
RPC Call
Connect to websocket and
Send messages through the websocket connection;
the results take into account the entire set of request logic performed within each of the client sessions.
Request statistics
Hardware
highest 10sec mean
lowest 10sec mean
Highest Rate
Mean Rate
Mean
1 Node - 1 CPU / 3GB RAM
33.79 msec
0.975 msec
130.9 / sec
39.20 / sec
15.84 msec
2 Nodes - 1 CPU / 3GB RAM (each)
2.49 sec
1.24 msec
208.7 / sec
57.92 / sec
79.85 msec
2 Nodes - 2 CPU / 6GB RAM (each)
0.23 sec
1.18 msec
342.7 / sec
103.18 / sec
42.60 msec
The table below includes the amount of network throughput handled by the game server with the data messages exchanged within the matches. We can see that the number of bytes received by the clients is much higher than the number of bytes sent; 50% of messages sent by clients introduce a broadcast to all match participants by the server as noted above.
Network Throughput
Hardware
Sent/Received
Highest Rate
Total
1 Node - 1 CPU / 3GB RAM
Sent
6.99 Mbits/sec
237.49 MB
Received
47.81 Mbits/sec
1.54 GB
2 Node - 1 CPU / 3GB RAM (each)
Sent
11.53 Mbits/sec
392.80 MB
Received
62.34 Mbits/sec
1.97 GB
2 Node - 2 CPU / 6GB RAM (each)
Sent
18.86 Mbits/sec
641.29 MB
Received
136.41 Mbits/sec
4.43 GB
Last updated