Alpha Test Feedback July 16th 2016

GM_thundercat · July 17, 2016, 3:00am

Cancelled due to technical difficulties

Release Notes

Age of Ascent is now https everywhere served over secure SSL/TLS.
Testing Windows Server 2016 with HTTP/2 when supported.
Fixed forum account activation and email.
Free-for-All (FFA) – all ships are hostile
Dual Projectile Multi-Cannons, 320 rounds/sec (19.2k per min), velocity 4 km/s, 7 km range
Next playtest after this one will be in five weeks (Sat Aug 20th) - it will be a big one!

If you want to know the bigger picture and where we are heading don’t forget to check out: About Age of Ascent

Thank you all for joining us on this journey!

Mik · July 17, 2016, 2:59am

Best alpha thus far!

tirolo · July 17, 2016, 3:01am

Indeed!

Kamryn_Michel · July 17, 2016, 3:02am

Hey guys logged in, couldn’t get past the username, then started flying around, but could never shoot or connect.
Kamzilla89

Paul_Thornton · July 17, 2016, 3:02am

Not just you I have the same Issue

GM_thundercat · July 17, 2016, 3:02am

Sorry, cancelled due to technical difficulties

We are running through a post-mortum; apologies to the players who turned up

Paul_Thornton · July 17, 2016, 3:03am

Aww Thanks for the fast update at least.

GM_stormcrow · July 17, 2016, 3:03am

We’re pulling the plug on this playtest - clearly something catastrophic has happened at the networking layer.

Many, many apologies to all of you who turned up for this. We’re going to find out what went wrong and do a full post-mortem, and will let you all know what it was and what the next playtest schedule looks like.

Again, our massive apologies. First time this has happened, and we need to track down what caused it. It was, of course, working perfectly a bare 30 minutes ago…

Regards,

SC

Christoffer_Strid · July 17, 2016, 3:19am

Good luck with the bug hunt! …back to watching Stranger Things… BTW I had a lot of ajax service fails before I got to the pre-fight flying state, stats/points-something-something I think the calls where, but I guess that was just some general connection problems (to one or more of the services)

GM_stormcrow · July 17, 2016, 3:19am

Potentially useful info, thanks Christoffer.

Christoffer_Strid · July 17, 2016, 3:21am

sorry I didnt save the chrome dev console logs, maybe that could be an advice for testers?

Spp · July 24, 2016, 3:17pm

I was not at the play test but just for my own information, and others, what seemed to have gone wrong and what can be done to fix it?

GM_stormcrow · July 29, 2016, 11:37pm

Hi Spp (& everyone else!),

We’ve done extensive post-mortems and server dump log analysis. We’ve even had third parties review what went on to cause the whole system to grind to a halt after only receiving 22 message packets from the outside world (given that we’ve handled up to 267 million per second in the past, 22 is a frankly pathetic number).

The answer, however, is that we simply don’t know what caused the lock-up.

In classic British understatement, this is “unfortunate”

It could be that there’s an issue in the AoA codebase somewhere (which seems unlikely, as the network and server code has been fixed for sometime now, as has worked perfectly every time, even under extraordinary load).

It could be an issue with the number of cores we dedicated to this particular test (just to see if we could, we ran this last test on a single CPU core machine), and perhaps the cross-CPU needs of some threads caused the seizure.

It could be that we were running this test on Windows Server 2016 (which is not yet publicly available) and there’s an error down there in the depths of the kernel that we’re unaware of (and presumably MS as well, given it’s supposed to be going RTM in August!).

We can’t replicate the error, despite having tried quite hard to do so. We wish we could replay all the data that was coming into the server - especially packet #22 - but we can’t, as all the client/server comms are SSL encrypted, so there’s no ‘do-over’ available to reproduce the bug.

We have both Intel & Microsoft on board in tracking this down. It’s quite important, even outside of the very important universe that is AoA

That’s the bad news, but now for the better news…

After discussions, we have collectively found something that might be the candidate for the error we experienced in the last playtest.

We can’t tell you what we think it is, because it’s quite deep down there - inside things way, way out of our control.

We have, however, decided to run the next playtest using the same codebase and infrastructure as this last test - simply with additional logging procedures in place - to see if the problem reoccurs, and (if so) to hope that we can capture the specific inbound messages that caused it to occur.

We will also have a fallback plan in place (a different codebase, with a different configuration) so that, if we can’t get the system up within 10 or 15 seconds, we can redirect all the players to a “known good” hardware/OS/software configuration that we’ve used in the past, so you can all enjoy a playtest anyways.

Hope that explains things, as much as I can.

Regards,

SC

Spp · August 4, 2016, 6:18am

While it is unfortunate that the error can not be tracked down, I appreciate the update!

I hope to make the next test!

GM_stormcrow · August 10, 2016, 7:21pm

The good news is that we’re pretty sure we (collectively) have genuinely tracked down and killed the bug that afflicted the last playtest. So huzzah to Thundercat and crew.

The semi-bad news is that this has pushed our release schedule off by a further few weeks; but tbh the whole point of testing is to run into these kind of issues, so we’re feeling pretty good about things again.

Hope to see you out there Saturday the 20th!

Best,

SC

Paul_Thornton · August 20, 2016, 9:00pm

Would love to hear about what caused this in more depth should you ever have the time. (or permission from MS/Intel).