Entwickler-Tracker: Update zu Problemen bei PvP-Warteschlangen

GuildNews-Team16. Juli 2019

In den letzten Wochen gab es vermehrte Berichte über Instabilitäten bei den PvP-Warteschlangen. Spielerinnen und Spieler tauschen sich über ihre jüngsten Erfahrungen dies bezüglich aus und forderten eine schnelle Problembeseitigung von ArenaNet.

Am 10. Juli wurde ein kleiner Patch veröffentlicht, der das Problem eindämmen sollte. Im offiziellen Forum gehen die Entwickler nun ins genaue Detail über die vorgenommenen Änderungen.

Server-Programmierer gibt einen technischen Einblick zu PvP-Warteschlangen

Der Server-Programmierer Robert Neckorcuk gab bekannt, dass es eine spürbare Stabilitätsverbesserung in PvP-Warteschlangen seit den Veränderungen gab.

Außerdem gibt Neckorcuk einen kleinen Einblick in die technische Infrastruktur hinter dem System der PvP-Warteschlangen. So spricht er davon, dass ein großer Teil der technischen Kommunikation über sogenannte „micro-services“ ablaufen würde. Diese Code-Gebilde würden untereinander kommunizieren und Informationen untereinander austauschen.

Nun konnte es jedoch anscheinend passieren, dass wenn ein „Gateway“, eine Verbindung zwischen den micro-services, beendet wurde, keine Updates mehr empfangen werden konnten. Dadurch hing man in der Warteschlange fest.

Durch die Etablierung eines neuen Systems konnte Neckorcuk jedoch einige Verbesserungen in der System-Kommunikation vornehmen. Dadurch laufen die PvP-Warteschlangen deutlich stabiler.

Falls ihr technisches Interesse habt, könnt ihr euch den ganzen Beitrag von Neckorcuk hier unten durchlesen:

Foren-Beitrag

Hello Again PvP Community!

I wanted to provide you with a status update on the Queue Instability issue’s.

For many of you, noticed or not, there has been a noticeable increase in the reliability of the queue system (about 10x) since we deployed a change last Wednesday, July 10. That being said, we are still seeing two additional types of ’stuck‘ screens that we are continuing to dive into.

So what fix did we push out last week? Depending on how closely you follow ours or industry tech, you may know that our infrastructure is built using micro-services. Each service deals with (ideally) one core task, and can talk to other micro-services through messaging. The micro-service that handles arena-based PvP is called PvpSrv. (Go figure…) When creating objects (Arena’s, matches, rosters, etc), PvpSrv will „talk“ to other services to persist the current data and state of each of these objects.

For some clusters of micro-services, each service is able to talk to others directly, no middle men or gatekeepers or anything. Some micro-services however live in different clusters. For PvpSrv to talk to some of these services, it must make a connection to a „gateway“ micro-service, and that gateway will forward the message to the appropriate micro-service in a different cluster. This all works well for the case of a few micro-services sending a few messages, but PvpSrv is not the only service talking cross-cluster. We have… several… gateways that handle the traffic of… several… micro-services.

So there’s our background – PvpSrv, when setting up a player in a new roster, will send messages to some local micro-services for data and persistence, and will send messages through gateways for additional data and state persistence. How was this causing „stuck“ rosters? PvpSrv config was set up to use ‚round robin‘ gate connections; each roster would get its state updates through a different gateway. (e.g. If we had 4 gateways, 25% of all rosters would be on gateway 1, 25% of rosters on gateway 2, etc.) This worked well for distributing the message load, but didn’t work so well for restoration and resilience.

There are many reasons why a service can restart, hardware can die (much less likely), or a network can disconnect (more common than you think). In the case of PvpSrv talking to the gateways, if and when a gateway connection terminated, PvpSrv would have all the rosters re-connect to the new pool of available gateways. For the majority of rosters, they would retain their existing connection. However, for rosters that were talking through the terminated gateway, they would create a new connection to another gateway, but the micro-service they were talking to would not know where to send any in-progress response messages. If a state update was made, the backing service would now be sending a message to a gateway that may or may not be connected to a given roster object. Then of course, the roster object would miss its state update, and it would, well, stick.

In terms of code changes, the actual change was very simple – instead of round-robin assignment, PvpSrv now connects to one gateway with a single connection. If and when this one connection is severed, PvpSrv will connect to another, single gateway. All rosters are associated with the single gateway, and the backing micro-services have only one location through which to send messages.

This was a great find, and I am glad to have seen the incident count drop dramatically over the past week. As stated, we still have some work to do, and are currently eyes deep in an issue surrounding map voting and sticking progress.

We hope this and other up-coming changes positively impact your PvP experiences!

-R

Server-Programmierer gibt einen technischen Einblick zu PvP-Warteschlangen

GuildNews-Team

Schreibe einen Kommentar Antworten abbrechen