Enabling playback voice control using Google Assistant, Alexa, and any other integrated voice assistant
Photo by Rome Wilkerson on Unsplash
In this post, we’ll learn about Android MediaSession
API, why we should use it, and how to implement it on Android TV (or Fire TV) apps. After setting the ground, we’ll follow a step-by-step guide on a basic MediaSession
implementation. The sample app used for this article and all the code below can be found here.
If you need to become more familiar with or need a recap on how playback controls are implemented using the Leanback components, check my previous articles. I cover from the basics to further customizations on the Leanback playback controls.
📺 Android TV Leanback: Playback Controls — Part 1
What is a MediaSession?
As per the documentation, a MediaSession
“Allows interaction with media controllers, volume keys, media buttons, and transport controls“. More than that, a MediaSession
is the control center where we can read information about what is currently being played on the Android device and dispatch media control actions such as play, pause, rewind, skip, seek, and more.
From an active MediaSession
, the Android system can control an app’s media playback and query information about it. Whenever you send, for example, a play/pause command by pressing or tapping on your earbuds, you are triggering MediaSession
callbacks that ask the underlying player app to perform those actions.
On the other hand, a music or video app should provide information about what’s currently playing to an active MediaSession
. This is how the Android system can display this data to the user and provide interfaces where the user can dispatch media actions—for example, using on-screen buttons or sending voice commands through Google Assistant or Amazon Alexa.
Spotify MediaSession notification
What are the benefits?
The benefits of implementing the MediaSession
in our apps vary from device to device.
- The media playback can be controlled by voice on devices that support Google Assistant or other kinds of voice assistant.
- On TV devices that support HDMI-CEC, the playback can be controlled using the conventional remote control media keys.
- On phones, on-screen media controls. Depending on the Android version, the OS can get information about a
MediaSession
and provide media controls that will appear on the lock screen. - It also enables other devices, for example, wearables, to connect to your app’s media session and control it from your wrist.
- Provide a way for other apps to control your
MediaSession
. For example, a navigation app can request a music player to play your favorite playlist when you start driving.
This is a powerful API. It’s not just about transport controls (play, pause, etc.) or displaying the content metadata. It’s also about sending specific media playback requests to an app capable of handling and performing them. This is what allows you to say confidently:
“Hey Google, play The Suburbs from Arcade Fire on Spotify.”
Step-by-Step Implementation Guide
Setting up the project
The first step is to add the androidx.media
dependency on your app build.gradle
.
implementation "androidx.media:media:1.6.0"
After that, and only if you plan to target Amazon Fire TV devices, you must add the following permission and, optionally, a meta-data to the AndroidManifest.xml
.
<?xml version="1.0" encoding="utf-8"?> | |
<manifest xmlns:android="http://schemas.android.com/apk/res/android"> | |
... | |
<uses-permission android:name="com.amazon.permission.media.session.voicecommandcontrol" /> | |
... | |
<application> | |
<meta-data | |
android:name="com.amazon.voice.supports_background_media_session" | |
android:value="true" /> | |
<application/> | |
</manifest> |
AndroidManifest.xml
Note: The com.amazon.voice.supports_background_media_session
metadata is optional, and the behavior change from its usage is described here.
Creating the MediaSession
To create a MediaSession
, the only required arguments are a Context
, and a String
tag used to identify the MediaSession
when debugging.
private val mediaSession = MediaSessionCompat(context, "VideoPlayback")
Conceptually, a MediaSession
should live as long as the media player lives. On video apps, this usually means as long as the Activity
or Fragment
with the playback UI is in the foreground. Meanwhile, an audio app MediaSession
might have the same lifecycle of the Service
that holds the player instance, as they usually support background playback.
With this in mind, I decided to manage the MediaSession
lifecycle for the sample app completely inside the SimplePlaybackTransportControlGlue
as it receives all the callbacks about the host lifecycle, in this case, a VideoSupportFragment
.
class SimplePlaybackTransportControlGlue( | |
context: Context, | |
playerAdapter: MediaPlayerAdapter, | |
) : PlaybackTransportControlGlue<MediaPlayerAdapter>(context, playerAdapter) { | |
private val mediaSession = MediaSessionCompat(context, "VideoPlayback") | |
init { | |
mediaSession.setCallback(SimpleMediaSessionCallback()) | |
} | |
override fun onHostStart() { | |
super.onHostStart() | |
mediaSession.isActive = true | |
} | |
override fun onHostPause() { | |
super.onHostPause() | |
mediaSession.isActive = false | |
} | |
override fun onDetachedFromHost() { | |
super.onDetachedFromHost() | |
mediaSession.release() | |
} | |
} |
Job Offers
Then, we start the MediaSession
configuration by providing a listener that will be called whenever media actions are dispatched. We do so by calling the MediaSession.setCallback()
function and passing an instance of MediaSessionComapt.Callback
. Inside these callbacks, we’ll perform the play, pause, seek, and any other operation supported by our MediaSession
.
private inner class SimpleMediaSessionCallback : MediaSessionCompat.Callback() { | |
override fun onPlay() = this@SimplePlaybackTransportControlGlue.play() | |
override fun onPause() = this@SimplePlaybackTransportControlGlue.pause() | |
override fun onSkipToNext() = this@SimplePlaybackTransportControlGlue.next() | |
override fun onSkipToPrevious() = this@SimplePlaybackTransportControlGlue.previous() | |
override fun onRewind() = this@SimplePlaybackTransportControlGlue.rewind() | |
override fun onFastForward() = this@SimplePlaybackTransportControlGlue.fastForward() | |
override fun onSeekTo(pos: Long) = this@SimplePlaybackTransportControlGlue.seekTo(pos) | |
} |
Custom Callback class inside SimplePlaybackTransportControlGlue
For the callbacks to be called, though, the MediaSession
must be active. We can activate and deactivate a MediaSession
by calling MediaSession.isActive = true/false
.
Playback State
Once you have the playback information about your media, you should pass it on to the MediaSession
. For that, we use the PlaybackState
class. This is how the MediaSession
knows which actions are available and the current playback state.
The existing playback states for a MediaSession
, and their documentation can be found inside the PlaybackStateCompat
class or here. In short, they are: STATE_NONE
, STATE_STOPPED
, STATE_PAUSED
, STATE_PLAYING
, STATE_FAST_FORWARDING
, STATE_REWINDING
, STATE_BUFFERING
, STATE_ERROR
, STATE_CONNECTING
, STATE_SKIPPING_TO_PREVIOUS
, STATE_SKIPPING_TO_NEXT
, and STATE_KIPPING_TO_QUEUE_ITEM
.
We can store an Int
value that represents the current playback state, and whenever it changes, we will update the MediaSession
PlaybackState
.
private var playbackState: Int = -1 | |
set(value) { | |
if (field != value) { | |
field = value | |
invalidatePlaybackState() // We'll cover this function later on. | |
} | |
} |
Still, inside the SimplePlaybackTransportControlGlue
, we can listen to almost all player events we need. This was another reason I decided to let the MediaSession
inside the glue.
Add the following code to your glue to listen to these events and update the playbackState
value.
class SimplePlaybackTransportControlGlue(...) { | |
... | |
override fun onPlayCompleted() { | |
super.onPlayCompleted() | |
playbackState = PlaybackStateCompat.STATE_NONE | |
} | |
fun onStartBuffering() { | |
playbackState = PlaybackStateCompat.STATE_BUFFERING | |
} | |
fun onFinishedBuffering() { | |
playbackState = when (isPlaying) { | |
true -> PlaybackStateCompat.STATE_PLAYING | |
else -> PlaybackStateCompat.STATE_PAUSED | |
} | |
} | |
fun rewind() { | |
playbackState = PlaybackStateCompat.STATE_REWINDING | |
seekTo(currentPosition - 10_000) | |
} | |
fun fastForward() { | |
playbackState = PlaybackStateCompat.STATE_FAST_FORWARDING | |
seekTo(currentPosition + 10_000) | |
} | |
... | |
} |
A few events aren’t available to the glue directly. More specifically, the buffering started/finished, and error events are sent to the glue host, the VideoPlaybackFragment
. You can receive them on the host and forward them to the glue using public functions. (For simplicity, I decided not to add the error event as there are too many of them)
Playback Actions
We must pass to the MediaSession
the actions that it supports. This information is passed as a bitmask of the available actions. At the time of writing, there are 22 different actions a MediaSession
supports (check them here). Again, we’ll only support the most common media actions for simplicity.
fun mediaSessionSupportedActions(): Long { | |
return PlaybackStateCompat.ACTION_PAUSE xor | |
PlaybackStateCompat.ACTION_PLAY xor | |
PlaybackStateCompat.ACTION_PLAY_PAUSE xor | |
PlaybackStateCompat.ACTION_REWIND xor | |
PlaybackStateCompat.ACTION_FAST_FORWARD xor | |
PlaybackStateCompat.ACTION_SKIP_TO_NEXT xor | |
PlaybackStateCompat.ACTION_SKIP_TO_PREVIOUS | |
} |
Note: The supported actions should be updated based on your media playback. For example, if your app plays ads, usually, you shouldn’t allow the user to fast-forward or rewind when the ad is playing. In this case, you should update the PlaybackState
and remove the ACTION_FAST_FORWARD
, ACTION_REWIND
, and any other action that should be prevented during ad playback. Look at the mediaSessionSupportedActions()
from the sample app here to clarify this idea.
Setting MediaSession PlaybackState
After gathering all the information about the current playback status, you can pass it to the MediaSession
like the following:
private fun invalidatePlaybackState() { | |
val playbackStateBuilder = PlaybackStateCompat.Builder() | |
.setState(playbackState, currentPosition, 1.0F) | |
.setActions(mediaSessionSupportedActions()) | |
.setBufferedPosition(bufferedPosition) | |
mediaSession.setPlaybackState(playbackStateBuilder.build()) | |
} |
Note: This is where you can also set the error state of the MediaSession
by calling PlaybackStateCompat.Builder.setErrorMessage()
.
Releasing the MediaSession
After finishing the playback, remember to call MediaSession.release()
to release the resources used by the MediaSession
that won’t be needed anymore.
Testing the Implementation
Suppose you are using an emulator or cannot use the voice assistant on your device for any specific reason. In that case, you can use the adb
to dispatch the MediaSession
actions directly to your device.
- On Android 11 and above, you can run
adb shell cmd media_session dispatch <play|pause|play-pause|rewind|fast-forward|next|previous|stop|mute>
. - On lower versions, the command is slightly different.
adb shell media dispatch <play|pause|play-pause|rewind|fast-forward|next|previous|stop|mute>
.
ExoPlayer Integration
ExoPlayer
has an extension called MediaSessionConnector
that facilitates the integration with the MediaSession
API. You can check this Google codelab, the official docs, and this medium post for more details.
Reference and Documentation
Thank you!
Thanks for going through this entire post! Make sure to check the sample app and follow me to be notified when new content is available!
I suggest this very special list of Android TV articles. It is frequently updated, and there’s a lot of value in there!
https://admqueiroga.medium.com/list/android-tv-leanback-guide-9a363e566f38
You can also connect with me on Twitter.
This article was previously published on proandroiddev.com