Gaming is becoming a big data problem, as online multiplayer sessions and always-on consoles let developers keep track of players like never before. Square Enix starting doing this in 2007, Jim Blackhurst tells, because the company's developers were "interested to see if the decisions we had made as designers were being played out in reality".

"When we approach new titles, we sit down with the developers and ask them to put a hypothesis together. For the new Tomb Raider for instance, the designer will come back and say 'I reckon what people will do is they'll go this way through the level', or 'they're going to end up at the end of this level with this much ammo left'. The data that is then sent to us is defined, it all goes to answer a question somewhere."
It's a system that came about gradually, as developers realised enough players were playing online (or syncing their single-player saves with Square Enix's servers) to give meaningful statistics.
Blackhurst said: "We wanted to understand how people flowed through levels, so we set out on a bit of a big data adventure. We went online first with enterprise stuff, the kind of thing you'd find in a big bank, and we ended up giving ourselves a denial of service attack on many occasions."
Blackhurst and Tomas Jelinek, both from Square Enix Europe's online operations team, first faced the scale of the problem with Just Cause 2. Jelinek explains: "The idea was, 'OK, we're collecting this data now, so what can we use the data for?' Just Cause 2 was an open world and you could do anything -- you could do stunts, jump from buildings, steal cars, shoot things -- and all that was trapped in the metrics. We wanted to feed it back and create some internal competitions, like, 'hey, you're playing Just Cause 2 as well, try to do more headshots than this other player'. But it was a nightmare to do."
Neither Square Enix's servers nor its internet connection could handle the amount of data coming through. "You'd collect the data and return it to the player, the deadline was 30 minutes but sometimes, especially at the lunch peak, it was struggling and took much longer than that," Jelinek said. "I remember I didn't sleep for something like five days, trying to keep the servers running and collecting the data". Gamers' profiles would regularly go out of sync, with even banal stats like the number of headshots in a gunfight going missing.
By the time Deus Ex: Human Revolution shipped, Square Enix had developed the infrastructure it needed to both absorb all the information that was coming in and to analyse it in a meaningful way. Jelinek said: "Jim was asked to produce some statistics like how many kilometres players drove, and he had to look at all the data we had gathered since the beginning. The kernel ran and it took something like three weeks to run that single query."
Now, that same query can take around two minutes with a MongoDB database hosted on a cloud server. The implications for how Square Enix develops games are intriguing, as it significantly simplifies and speeds up much of a game's development. Something that was initially meant to be a reactive way of tracking player activity became something more proactive, a way of collecting and archiving information on actual gameplay elements. It lets Square Enix's studios create what are essentially templates that its developers know have already been tested and shown to work on other titles.
"When someone wants to add something to a new title we can accommodate that with the online suite that has been programmed by a specific team," said Jelinek. "They don't have to reinvent the wheel. They know they can come to us and say 'we have this new game, it's going to have this many players playing it, it's going to have these features, set it up for us', and we just do that using our tools which we built over the years, and it's done."
That means developers don't have to design their own leaderboard mechanisms, for instance. Square Enix's online suite team, based in Montreal, organises and standardises them. Every new title that introduces a new feature has it added to the suite. Blackhurst mentions the smartphone app that accompanies Hitman: Absolution as an example of a new idea, with the functionality available for other development teams to copy -- the central database is powerful enough to handle the metrics traffic that flows between the app and the main game fast enough to make the idea practical.
The danger with this approach, though, is that it appears to encourage a very obvious design process, one that doesn't sound very imaginative. Great for blockbuster, big-name titles, but perhaps a dangerous temptation that could lead to cookie-cutter design.
Blackhurst doesn't agree: "There's a healthy amount of competition among the studios themselves, to push the boundaries of what's possible. I don't think there's any complacency. Everybody likes to sneak something in."
Another advantage that Jelinek points to is that advanced tracking of gamers makes it much easier to plan what goes into patches. He said: "Imagine you're playing the game, and we know there's a problem somewhere that can potentially cause you headaches or crash the game. We can look at the majority of players and know they're not there yet. So we know we should probably work harder on getting that fixed before most of the people get there. Hardcore players will finish a game in two days and they start playing something else, but most players will take time, they will explore, so we know we have time."
More efficient patching -- smaller but more frequent -- is a model that's familiar for mobile gaming, and it's where Blackhurst believes the gaming industry as a whole is converging on. "As we move forward to more connected titles, titles where we have iterative development, pushing updates at a much faster cadence than we would do with normal titles, it's essential," he explained. "That's a different model to what we have today with our console process, but you look at mobile and free-to-play on the web, it's there already -- it's been there for years."
"Within the next couple of months we're going to be a petabyte business, and within 12 months we'll be a multi-petabyte business," Blackhurst adds. "How do you extract value from that?"