Blog Infos
Author
Published
Topics
, , , ,
Author
Published

Generated with deepai.org

 

In this article, I’ll tell you a story about how our team has created a multiplatform, full-fledged game engine using MVI architecture, fully in Kotlin! You will also learn a lot about how I implemented some insane requirements from our customer when working on said engine. So let’s jump right in!

So, I’m working on an app called Overplay. It is similar to TikTok, but the videos you see are actually games that you can play as you scroll. One day, I was just painting another button when the customer came to me to discuss the app’s performance and the experience of starting and finishing a game. In short, the problem we had for years was the legacy game engine.

It was still using XML on Android and contained 7 thousand lines of legacy code, most of which was dead but still executed, tanking the performance. The experience was not fluid, the game loaded slowly (20 seconds to load a game was a regular occurrence for us), and it was laggy. We also had a lot of nasty crashes related to concurrency and state management because dozens of different parts of the engine wanted to send events and update the game state simultaneously… The team had no idea how to solve those issues — our current simple MVVM architecture was not holding up at all. Only the ViewModel contained 2000 lines of code, and any change exploded something else.

So the customer said — time to make the game engine great again. But the new requirements he wanted to be implemented were just bonkers:

  • The game engine must be embedded directly into the feed of games to let the user scroll away once they finish the game. It means it has to be inside another page and bring all the logic with it!
  • The game engine must start games in less than 2 seconds flat. This means that everything has to be managed in parallel and in the background as the user scrolls!
  • If the user replays or restarts the game, the loading must be instant. Thus, we have to keep the engine running and manage the resources dynamically.
  • Every single action of the user must be covered by analytics to keep improving it in the future.
  • The game engine must support all sorts of videos, including local ones for when someone wants to make their own game and play it.
  • Since the user scrolls through videos like on TikTok, we need to efficiently free and reuse our media codecs and video players to seamlessly jump back and forth between playing a game video and other items of the feed.
  • All errors must always be handled, reported, and recovered from to ensure we no longer ruin the users’ experience with crashes.

I’m gonna be honest, I thought I was gonna get fired.

“There’s no way to implement this crazy logic” — I thought. Half of the app must be easily embeddable and the state must always be consistent, with hundreds of state updates going on at the same time: the device sensors, our graphic engine, the video player, and more. Everything has to be reused everywhere and loaded in parallel. To put the last nail in the coffin, the amount of code has to be kept small as well to let the team make future changes to the engine without shooting themselves in the foot.

But I had to do it, there was no way to avoid it this time. Of course, I couldn’t have done this alone. Huge props to the team:

  • One member took our graphics engine and made it compatible with Compose since there was no way we were doing that without Compose.
  • Another developer spent time making a module for the Game Loop which sends events and orchestrates the graphics engine.
Preparations

So I was responsible for game loading and overall integration in the end. And I thought — well, these requirements are not about features, they are about architecture. My task was to implement the architecture that supports all of those. Easier said than done though…

Here’s a simplified diagram of what my final architecture looked like:

Pardon me for my UML skills — they’ve gotten rusty so not all standards are followed here.

An important thing to understand before we begin is that to implement the new architecture, I got inspired by Ktor and their amazing system of “Plug-Ins” that form a chain of responsibility and intercept any incoming and outgoing events. Why not use this for any business logic, I thought? This is a new approach to app architecture because we used to only do this kind of thing with CQRS on the backend or in networking code.

Luckily, this was already implemented in the architectural framework we were using — FlowMVI — so I didn’t need to write any new code for this, I just needed to use the plugin system creatively now. But the framework was meant for UI, not game engines! I had to make some changes to it if I didn’t want to get fired.

So over the next two weeks, I spent time implementing the supporting infrastructure:

  • I added a bunch of new plug-ins, that will allow me to inject any code into any place in the game engine’s lifecycle. We’ll talk about those in a moment.
  • I ran benchmarks hundreds of times, comparing the performance with the fastest solutions to ensure we get maximum performance. I worked on the code until I optimized the library to the point that it became top-5 in performance among 35+ frameworks benchmarked, and as fast as using a simple Channel (from coroutines).
  • I implemented a new system for watching over the chain of plugin invocations which allowed me to monitor processes in any business logic transparently, which I very creatively named “Decorators.

I also set a requirement for myself — ANY piece of logic must be a separate thing in the engine’s code that can be removed and modified on demand. The code must not be placed in the same class. My goal was — I’m gonna keep the engine’s code less than 400 lines long.

This felt like arming myself to the teeth as some secret ops dude from a movie. I was ready to crush this.

Let’s go.

Getting Started — Contract

First of all, let’s define a simple family of MVI states, intents and side-effects for our engine. I used FlowMVI’s IDE plugin, typed fmvim in a new file, and got this:

internal sealed interface GameEngineState : MVIState {

    data object Stopped : GameEngineState
    data class Error(val e: Exception?) : GameEngineState
    data class Loading(val progress: Float = 0f): GameEngineState

    data class Running(
        override val game: GameInstance,
        override val player: MediaPlayer,
        override val isBuffering: Boolean = false,
    ) : GameEngineState
}

internal sealed interface GameEngineIntent : MVIIntent {
    // later added lots of...
}

internal sealed interface GameEngineAction : MVIAction {
    data object GoBack : GameEngineAction
}

I also added a Stopped state (since our engine can exist even when not playing), and a progress value to the loading state.

Configuring our Engine

I started by creating a singleton called Container, which will host the dependencies. We have to keep it as a singleton and start/stop all its operations on demand to support instant replay of games and caching. We’re going to try and install a bunch of plugins in it to manage our logic. So, to create it, I typed fmvic in an empty file and then added some configuration:

internal class GameEngineContainer(
    private val appScope: ApplicationScope,
    userRepo: UserRepository,
    configuration: StoreConfiguration,
    pool: PlayerPool,
    // ...a bunch of other stuff
) : Container<GameEngineState, GameEngineIntent, GameEngineAction>, GameLauncher {

    override val store by lazyStore(GameEngineState.Stopped) {
        configure(configuration, "GameEngine") // (1)
        configure {
            stateStrategy = Atomic(reentrant = false) // (2)
            allowIdleSubscriptions = true 
            parallelIntents = true // (3)
        }
    }
}

This way, we can easily inject the dependencies here. The “Store” is the object that will host our GameState, respond to GameIntents, and send events to the UI (GameActions).

  1. Here I am transparently injecting some stuff into the store using DI, more on that in a bit.
  2. During my benchmarks, I found out that reentrant state transactions (which I discussed in my previous article) were tanking performance. They are 15x slower than non-reentrant ones! The time is still measured in microseconds so it makes sense to use them for simple UI, but we had to squeeze every last drop of CPU power for the engine. I added support for those in the latest update, which reduced the time to nanoseconds per event!
  3. Everything had to be parallel for the game engine to keep it fast, so I enabled parallel processing. But if we don’t synchronize state access, we’ll have the same race conditions we had before! By enabling this flag while keeping atomic state transactions, I achieved the best of both worlds: speed, and safety!

We got ourselves so far:

  • Speed
  • Thread-safety
  • Ability to keep resources loaded on demand
  • Analytics and Crash reporting.

“Wait”, you may ask, “but there isn’t a single line of analytics code in the snippet!”, and I will answer — the magic is in the injected configuration parameter.

It installs a bunch of plug-ins transparently. We can add any logic to any container using the concept of plug-ins, so why not use those with DI? That function installs an error handler plugin that catches and sends exceptions to analytics without affecting the rest of the engine’s code, tracks user actions (Intents), and events of visiting and leaving the game engine screen as well. Having the huge game engine polluted by analytics junk is a no-no for us because we had this problem with MVVM — all of the stuff just gets piled on and on and on until it becomes unmaintainable. No more.

Starting and Stopping the Engine

Okay, so we created our Container lazily. How do we clean up and keep track of resources now?

The thing about FlowMVI is that it’s the only framework I know that allows you to stop and restart the business logic component (Store) on demand. Each store has a StoreLifecycle which can let you control and observe the store using a CoroutineScope. If the scope is canceled – the store is, but the store can also be stopped separately, ensuring our parent-child hierarchy is always respected.

My colleagues were skeptical about this feature at first, and for a while, I thought it was useless, but this time it literally saved my ass from getting fired: we can just use the global application scope to run our logic, and stop the engine when we don’t need it to keep consuming resources!

For the implementation, we’re just going to let the Container implement an interface called GameLauncher that will access the lifecycle for us:

override suspend fun awaitShutdown() = store.awaitUntilClosed()
override fun shutdown() = store.close()
override suspend fun start(params: GameParameters) {
    val old = this.parameters.getAndUpdate { params }
    when {
        !store.isActive -> store.start(appScope).awaitStartup() // start fresh
        old == params -> store.intent(ReplayedGame) // reuse running engine
        else -> { // restart if incompatible
            store.closeAndWait() 
            store.start(appScope).awaitStartup()
        }
    }
}

Then the code from other modules will just use the interface to stop the engine when it doesn’t need the game to keep running (e.g. the user has scrolled away, left the app, etc.), and call start each time the clients want us to play the game. But this feature would only be marginally usable for us if the store didn’t have a way to do something when it is shut down. So let’s talk about resource management next.

Managing Resources

We have a lot of stuff to initialize upon the game start in parallel:

  • Remote configuration for feature flags
  • Game assets like textures need to be downloaded and cached
  • Game Configuration and the game JSON data
  • Media codec initialization
  • Video file buffering and caching
  • And more…

And almost everything here cannot be simply garbage collected. We need to close file handles, unload codecs, release resources held by native code, and return the video player to the pool to reuse it, as player creation is a very heavy process.

And some stuff actually depends on the other, like the video file depends on the game configuration where it comes from. How do we do that?

Well, for starters, I created a plug-in that will use the callback mentioned above to create a value when the engine starts, and clean the value up when the engine stops (simplified code):

public fun <T> cached(
    init: suspend PipelineContext.() -> T,
): CachedValue<T> = CachedValue(init)

fun <T> cachePlugin(
    value: CachedValue<T>,
) = plugin {
    onStart { value.init() }
    onStop { value.clear() }
}

CachedValue is just like lazy but with thread-safe control of when to clear and init the value. In our case, it calls init when the store starts, and clears the reference when the store stops. Super simple!

But that plugin still has a problem because it pauses the entire store until the initialization is complete, which means our loading would be sequential instead of parallel. To fix that, we can simply use Deferred and run the initialization in a separate coroutine:

inline fun <T> asyncCached(
    context: CoroutineContext = EmptyCoroutineContext,
    start: CoroutineStart = CoroutineStart.UNDISPATCHED,
    crossinline init: suspend PipelineContext.() -> T,
): CachedValue<Deferred<T>> = cached { async(context, start) { init() } }

Then we just pass our asyncCached instead of the regular one when installing the cache plugin. Sprinkle some DSL on top of that, and we get the following game-loading logic:

 

override val store by lazyStore(GameEngineState.Stopped) {
    configure { /* ... */ }
    val gameClock by cache {
        GameClock(coroutineScope = this) // (1)
    }
    val player by cache {
        playerPool.borrow(requireParameters.playerType)
    }
    val remoteConfig by asyncCache { 
        remoteConfigRepo.updateAndGet()
    }
    val graphicsEngine by asyncCache {
        GraphicsEngine(GraphicsRemoteConfig(from = remoteConfig()) // (2)
    }
    val gameData by asyncCache {
        gameRepository.getGameData(requireParameters().gameId)
    }
    val game by asyncCache {
         GameLoop(
             graphics = graphicsEngine(),
             remoteConfig = remoteConfig(),
             clock = gameClock,
             data = gameData(),
             params = requireParameters(),
         ).let { GameInstance(it) }
    }
    // ... more ... 

    asyncInit { // (3)
        updateState { Loading() }
        player.loadVideo(gameData().videoUrl)
        updateState {
            GameEngineState.Running(
                game = game(),
                player = player,
            )
        }
        clock.start()
    }
    deinit { // (4)
        graphicsEngine.release()
        player.stop()
        playerPool.return(player)
    }
}

 

Job Offers

Job Offers

There are currently no vacancies.

OUR VIDEO RECOMMENDATION

No results found.

Jobs

No results found.

  1. Our game clock runs an event loop and synchronizes game time with video time. Unfortunately, it requires a coroutine scope where it runs the loop that should only be active during the game. Luckily, we already have one! PipelineContext, which is the context of the Store‘s execution, is provided with plugins and implements CoroutineScope. We can just use it in our cache plugin and start the game clock, which will automatically stop when we shut down the engine.
  2. You can see we used a bunch of asyncCaches to parallelize loading, and with the Graphics Engine, we also were able to depend on remote config inside (as an example, in reality it depends on lots of stuff). This greatly simplifies our logic, because the dependencies between components are implicit now, and the requesting party who wants just the graphics engine doesn’t have to manage the dependencies of it! The operator invoke (parentheses) is a shorthand for Deferred.await() for that extra sweet taste.
  3. We have also used an asyncInit which essentially launches a background job in the current game engine’s gameplay scope to load the game. Inside the job, we do final preparations, wait for all of the dependencies, and start the game clock.
  4. We have used the built-in deinit plugin to put all of our cleanup logic in the callback that is invoked as soon as the game engine is stopped (and its scope is canceled). It will be run before our cached values are cleaned up (because it was installed later), but after our jobs have been canceled, so that we can do what we want, and the cache plugin will then garbage-collect the rest of the stuff without us worrying about leaks.

Overall, these 50 lines of code have replaced 1.5 thousand lines of our old game engine’s implementation! I had to pick up my jaw from the floor when I realized how powerful these patterns are for business logic.

But we’re still lacking one thing.

Error Handling

A lot of things in the engine can go wrong during gameplay:

  • Some game-author forgot to add a frame to an animation,
  • A person lost their connection during the game,
  • The shaders failed rendering due to a platform bug, and more…

Usually, only the main errors for API calls are handled in apps with wrappers like ApiResult or some kind of try/catch. But imagine wrapping every single line of the Game Engine’s code in a try-catch… That would mean hundreds of lines of try-catch-finally garbage!

Well, you probably know what will happen. Since we can intercept any event now, let’s make an error-handling plug-in! I named it recover, and now our code looks like this:

override val store by lazyStore(GameEngineState.Stopped) {
   configure { /* ... */ } 
   val player by cache { /* ... */ }
   
   // ...

   recover { e ->
       if (config.debuggable) updateState { // (1)
           GameEngineState.Error(e) 
       } else when(e) { // (2)
           is StoreTimeoutException, is GLException -> Unit // just report to analytics
           is MediaPlaybackException -> player.retry()
           is AssetCorruptedException -> assetManager.refreshAssetsSync()
           is BufferingTimeoutException -> action(ShowSlowInternetMessage)
           // ... more ...
           else -> shutdown() // (3)
       }
       null // swallow the exception
   }
}
  1. If our store is configured to be debuggable (config is available in store plug-ins), we can show a full-screen overlay with the stack trace to let our QA team easily report errors to devs before they get to production. Fail Fast principle in action.
  2. In production, however, we will handle some errors by retrying, skipping an animation, or warning the user about their connection without interrupting the gameplay.
  3. If we can’t handle an error and cannot recover, then we shut down the engine and let the user try to play the game again, without crashing the app or showing obscure messages (those go to crashlytics).

With this, we’ve got ourselves error-handling for any existing and new code a developer may ever add to our game engine, with 0 try-catches.

Final touches

We’re almost done! This article is getting long, so I’ll blitz through some additional plug-ins I had to install to support our use cases:

override val store by lazyStore(GameEngineState.Stopped) {
    configure { /* ... */ }

    // ...
    val subs = awaitSubscribers() // (1)
    val jobs = manageJobs<GameJobs>() // (2)
    initTimeout(5.seconds) { // (3)
        subs.await()
    }
    whileSubscribed { // (4)
        assetManager.loadingProgress.collect { progress ->
            updateState<Loading, _> {
                copy(progress = progress)
            }
        }
    }
    install(
        autoStopPlugin(jobs), // (5)
        resetStatePlugin(), // (6)
    )
}
  1. Since a developer can make the mistake of starting the game, but never displaying the actual gameplay experience (user left, a bug, plans changed, etc…), I am using the pre-made awaitSubscribers plugin in snippet (3) to see if they appear within 5 seconds of starting the game, and if not, close the store and auto-cleanup the held resources to prevent leaks. Boom!
  2. I’m using another plug-in — JobManager, to run some long-running operations in the background. Code that uses it didn’t fit, but essentially it’s needed to track whether the user is currently playing.
  3. InitTimeout is a custom plugin that verifies whether the game has finished loading within 5 seconds, and if not, we pass an error to our recover plugin to decide what to do and report the issue to analytics.
  4. The whileSubscribed plugin launches a job that is only active when subscribers (in our case, UI) are present, where we update the visuals of the loading progress only when the user is actually seeing the loading screen. It allows us to easily avoid resource leaks if the game engine is covered up by something or hidden.
  5. The autoStopPlugin uses our job manager to watch for game load progress and gameplay progress. It looks at whether we have subscribers to pause the game when the user leaves, then stop it once the engine is not used for a while, eliminating the risk of leaking memory.
  6. The resetStatePlugin is a built-in one I had to install to auto-cleanup state when the game ends. By default, stores will not have their state reset when they stop. This is good for regular UI but not in our case – we want the engine to go back to the Stopped state when the game ends.

All of those plugins were already in the library, so using them was a piece of cake.

Conclusion

It was a wild ride, but after all this, I not only managed to keep my job, but I think that the overall solution has turned out pretty great. The engine went from 7+k to just 400 lines of readable, linear, structured, performant, extensible code, and the users are already enjoying the results:

  • Loading time went from ~20 seconds to just 1.75 seconds!
  • Crashed games fell from 8% to 0.01%!
  • We improved the throughput of the game event processing by 1700%
  • Video buffering occurrences during games went from ~31% to <10% due to our caching
  • Battery consumption during gameplay was reduced by orders of magnitude
  • ANRs during gameplay fell to being statistically 0
  • GC pressure decreased by 40% during gameplay

Hopefully, by this point, I’ve shown why the patterns we used to hate like Decorators, Interceptors, and Chain of Responsibility can be insanely helpful when building not just some backend service, networking code, or a specialized use-case, but also implementing the regular application logic, including UI and state management.

With the power that Kotlin gives when building DSLs, we can turn the fundamental patterns (used in software development for decades) from a mess of boilerplate, inheritance, and complicated delegation into a fast, straightforward, compact, linear code that is fun and efficient to work with. I encourage you to build something like this for your own app’s architecture and reap the benefits.

And if you don’t want to dive into that and want something already available, or are curious to learn more, then consider checking out the original library where I implemented everything mentioned here on GitHub, or dive right into the quickstart guide to try it in 10 minutes.

P.S. If we get 500 claps on this article, I’ll receive neural activation which will motivate my brain to write another article, explaining in detail how to implement the Plugin system in your own app.

This article is previously published on proandroiddev.com.

YOU MAY BE INTERESTED IN

YOU MAY BE INTERESTED IN

blog
It’s one of the common UX across apps to provide swipe to dismiss so…
READ MORE
blog
Hi, today I come to you with a quick tip on how to update…
READ MORE
blog
Automation is a key point of Software Testing once it make possible to reproduce…
READ MORE
blog
Drag and Drop reordering in Recyclerview can be achieved with ItemTouchHelper (checkout implementation reference).…
READ MORE
Menu