How does Cloud Firestore work?

Blog Infos

Author

Victor Brandalise

Published

16. February 2022

Topics

Cloud Firestore

Author

Victor Brandalise

Published

16. February 2022

Topics

Cloud Firestore

Cloud Firestore is a popular database for mobile and web applications. According to its documentation:

[…] It keeps your data in sync across client apps through realtime listeners and offers offline support for mobile and web so you can build responsive apps that work regardless of network latency or Internet connectivity.

Continuing the “How does X work” series, today we’re gonna explore how Firestore works under the hood. How do listeners work? How does it send/receive data from the backend? How does it keep things stored locally? Those are some of the questions we’ll be exploring today.

FirebaseApp

FirebaseInitProvider will handle the initialization of Firebase for the default project that it’s set to operate with using the data in the app’s google-services.json file. When building using Gradle, this ContentProvider is automatically integrated into the app’s manifest and executed when the app is launched.

If an app needs access to another Firebase project in addition to the default project, use initializeApp(Context, FirebaseOptions, String) to do that.

FirebaseFirestore

Represents a single Cloud Firestore database. It’s probably the most used class as it acts as a facade to other classes. It provides methods such as setFirestoreSettings, collection, document, runTransaction, runBatch, waitForPendingWrites, enableNetwork/ disableNetwork, clearPersistence, etc.

FirebaseFirestoreSettings

Specifies the configuration for your Firestore instance. The configurable values are host, sslEnabled, persistenceEnabled and cacheSizeBytes.

Data Bundles

Data bundles are serialized collections of documents.

These data bundles can be saved to a CDN or another object storage provider, and then loaded from your client applications. By doing that, you can avoid making extra calls against the Firestore database.

You can read more about Data Bundles here.

FirestoreMultiDbComponent

Even though most of us only use one database instance, Firestore supports multiple instances. FirestoreMultiDbComponent is basically a container that stores references to all available databases.

	class FirestoreMultiDbComponent {
	private final Map<String, FirebaseFirestore> instances = new HashMap<>();
	}

view raw FirestoreMultiDbComponent.java hosted with ❤ by GitHub

When you call FirebaseFirestore.getInstance(), it’s actually calling FirestoreMultiDbComponent with the default database id.

	@NonNull
	public static FirebaseFirestore getInstance() {
	FirebaseApp app = FirebaseApp.getInstance();
	...
	return getInstance(app, DatabaseId.DEFAULT_DATABASE_ID);
	}

view raw FirestoreMultiDbComponent#getInstance.java hosted with ❤ by GitHub

Here’s the method that returns a database given its id. It’s also responsible for creating a FirebaseFirestore if there’s none registered.

	@NonNull
	synchronized FirebaseFirestore get(@NonNull String databaseId) {
	FirebaseFirestore firestore = instances.get(databaseId);
	if (firestore == null) {
	firestore = FirebaseFirestore.newInstance(...);
	instances.put(databaseId, firestore);
	}
	return firestore;
	}

view raw FirestoreMultiDbComponent#get.java hosted with ❤ by GitHub

FirestoreClient

It looks very similar to FirebaseFirestore but here you start to see some real work being done.

It’s also responsible for creating the Datastore, the class that represents the connection to Firebase Firestore’s server.

It contains waitForPendingWrites that returns a task that resolves when all the pending writes at the time when this method is called received server acknowledgment.

It also contains write. According to its documentation:

Writes mutations. The returned task will be notified when it’s written to the backend.

There’s an interesting thing to note: write will behave differently based on the user’s connectivity. That might be the behavior most people expect but you need to remember that Cloud Firestore can also be used as an offline database. If your app allows users to use it offline, you’ll have to take that into consideration.

Most people have never heard of listen(Query, ListenOptions, EventListener<ViewSnapshot>) but you’ve probably used DocumentReference.addSnapshotListener or Query.addSnapshotListener, that’s the method they call to listen for changes.

listen shares the same data source for the equal queries, so calling DocumentReference.addSnapshotListener from multiple places using the same query is not costly.

To accomplish its purpose, listen relies mostly on QueryListener.

QueryListener

As you may know, Firestore is a reactive database meaning that when somebody updates a document you also receive the update if you’re listening for it.

QueryListener is one of the classes responsible for that.

Before we understand how QueryListener works we need to what learn ViewSnapshot is.

ViewSnapshot

A view snapshot is an immutable capture of the results of a query and the changes to them.

Whenever you query for something you don’t get the actual values you queried for. Firestore returns a QuerySnapshot and inside it you can find a ViewSnapshotthat contains the values your queried for. Take a look at QuerySnapshot#getDocuments.

	@NonNull
	public List<DocumentSnapshot> getDocuments() {
	List<DocumentSnapshot> res = new ArrayList<>(snapshot.getDocuments().size());
	for (Document doc : snapshot.getDocuments()) {
	res.add(convertDocument(doc));
	}
	return res;
	}

view raw ViewSnapshot#getDocuments.java hosted with ❤ by GitHub

ViewSnapshot is basically a data class, it holds together a lot of related data. It contains reference to the Query that generated it, the old documents, the new documents, a list of document changes, if the values are from cache, etc.

Now let’s get back to QueryListener. According to the documentation:

QueryListener takes a series of internal view snapshots and determines when to raise events.

Why does it say “determines when to raise events”, don’t all changes raise events? Well, not necessarily. When you start listening for a query you can give it a MetadataChanges, that’s going to define what kind of changes you’ll be notified of. You have two options: INCLUDE and EXCLUDE.

Currently document snapshots have two metadata properties hasPendingWritesand isFromCache.

If you specify MetadataChanges.INCLUDE, you’ll also be notified when any these two fields change. Let’s suppose you’re listening to collection C and a user who has no connectivity writes document D to collection C. Initially hasPendingWriteswill be true because this data has not been written to the backend yet. When the document is uploaded the data in your document is probably not going to change but hasPendingWrites will become false and you’ll receive an update for that.

What QueryListener does is basically wait for new view snapshots and decide whether or not it should notify you of a change based on these options.

Mutation

Represents a Mutation of a document.

Mutation is exactly what you think it is, it’s the change of something. A mutation of a document is one or more changes in a document. A change can mean setting or removing something. There are 3 main mutations:

DeleteMutation — represents that a document was deleted
SetMutation — represents that a whole document was created or changed
PatchMutation — represents that some fields in a document were created or changed

Mutation also includes the field transformation operations such as array union, array remove, increment value, and server timestamp.

Why are Mutations needed? Can’t Firestore simply change the document and store that? Let’s see a few reasons why it’s not so simple.

First, Firestore works offline, if we’re both offline, I modify one field and you modify another field, when we get online we expect the document to have both changes and that is more complex if we only store the mutated document.

Second, field transformations such as array union don’t have a reference to the whole document so Firestore needs a way to represent that transformation for it to be applied on the server.

Job Offers

Munich

Full Time

apply now

Posted 2 months ago

Senior Android Developer

SumUp

Berlin

Full Time

apply now

OUR VIDEO RECOMMENDATION

No results found.

Berlin

Senior Android Developer

Full Time

APPLY NOW

Boullay-Les-Troux (91)

Expert outils et système d’exploitation Android H/F

Full Time

APPLY NOW

Bangkok, Helsinki or Oulu

Senior Full Stack Developer

Full Time

APPLY NOW

MutationQueue

A queue of mutations to apply to the remote store.

Whenever you create, update or delete a document, a mutation is created. They are submitted individually or in group in case you’re using a WriteBatch to MutationQueue. That creates a MutationBatch, it’s simply a collection of mutations that will be sent to the server together.

Mutations remain in the MutationQueue until removeMutationBatch is called.

MutationQueue is an interface and like many other classes such as Persistence, ContentProvider, IndexManager, etc it has 2 implementations. A memory implementation and a SQLite implementation.

If you set FirebaseFirestoreSettings.persistenceEnabled to true, the SQLite implementation of these classes will be used to persist the changes locally.

The memory and SQLite implementations are very similar, the main difference is where their data comes from. Let’s see how these classes handle MutationQueue#isEmpty by looking at a reduced code version:

	class MemoryMutationQueue {
	private final List<MutationBatch> queue;

	@Override
	public boolean isEmpty() {
	return queue.isEmpty();
	}
	}

	class SQLiteMutationQueue {
	private final SQLitePersistence db;

	@Override
	public boolean isEmpty() {
	return db.query("SELECT batch_id FROM mutations WHERE uid = ? LIMIT 1")
	.binding(uid)
	.isEmpty();
	}
	}

view raw MutationQueue.java hosted with ❤ by GitHub

The reason the SQLite implementation filters by uid(user id) is that the same database is shared among multiple users but the memory implementation is instantiated by a user.

Persistence

Persistence is the lowest-level shared interface to persistent storage in Firestore.

What does that mean? By “lowest-level shared interface” it’s talking about shared between memory and SQLite. All the methods below have a memory and a SQLite implementation.

getMutationQueue returns a different MemoryMutationQueue instance by user when using the memory implementation and a new SQLiteMutationQueuethat shares the database when using the SQLite implementation.
runTransaction uses the native transition mechanism provided by SQL databases when using SQLite and ReferenceDelegate when dealing with the in memory implementation.
and so on…

	abstract MutationQueue getMutationQueue(User user);
	abstract TargetCache getTargetCache();
	abstract RemoteDocumentCache getRemoteDocumentCache();
	abstract IndexManager getIndexManager();
	abstract BundleCache getBundleCache();
	abstract DocumentOverlay getDocumentOverlay(User user);
	abstract void runTransaction(String action, Runnable operation);
	// other methods..

view raw Persistence.java hosted with ❤ by GitHub

A lot of code is simplified by this interface that can be used to talk to in memory and SQLite implementations without having to use 2 different classes.

LocalStore

LocalStore is a final class, the same implementation is used whether or not persistence is enabled.

Just because you have persistence disabled doesn’t mean Firestore doesn’t keep things locally. When persistence is disabled those things are kept in memory, when it’s enabled they are kept in SQLite.

Imagine you fetched document A from Firestore, you update the document and now you have A’, do you think A’ is really stored locally? Take a look at the documentation provided by Firestore:

The local store provides the local version of documents that have been modified locally. It maintains the constraint: LocalDocument = RemoteDocument + Active(LocalMutations)

The only things that are stored locally(either in memory or in SQLite) are RemoteDocuments and Mutations, the A’ document you have is a combination of those 2 things. Here’s how Firestore does that:

You call FirebaseFirestore#document to get a DocumentReference.
You call DocumentReference#get(Source.CACHE) to force Firestore to return the document available locally.
DocumentReference calls FirestoreClient#getDocumentFromLocalCache who calls LocalStore#readDocument.
LocalStore calls LocalDocumentsView#getDocument .

	// LocalDocumentsView.java

	Document getDocument(DocumentKey key) {
	// 1. Get all mutations for the given document
	List<MutationBatch> batches = mutationQueue.getAllMutationBatchesAffectingDocumentKey(key);
	return getDocument(key, batches);
	}

	private Document getDocument(DocumentKey key, List<MutationBatch> inBatches) {
	// 2. Fetch the remote version of the given document from the local cache
	MutableDocument document = remoteDocumentCache.get(key);
	for (MutationBatch batch : inBatches) {
	// 3. Apply all mutations to the remote document
	batch.applyToLocalView(document);
	}
	return document; // = RemoteDocument + Active(LocalMutations)
	}

view raw LocalStore.java hosted with ❤ by GitHub

As you can see Firestore never stores the changed version of a document locally, it only stores what’s on the server( RemoteDocumentCache) and the local mutations( MutationQueue), those two things are enough to create the changed document you’re expecting.

LocalStore also contains configureIndices(List<FieldIndex>). When you query a lot of data indexes become essential for maintaining good performance. Locally Firestore creates a simplified version of an index to speed up some queries. To create those indexes it uses IndexManager that contains two implementations:

MemoryIndexManager: only supports collection parent indexing, that’s used when doing collection group queries.
SQLiteIndexManager: supports both collection parent and document field indexing.

Even though document field indexing is supported in SQLite, it appears it’s never used 🤷‍♂️.

Be aware that when a collection query is executed locally, it always iterates through all available documents, if you have a huge amount of documents that can become a problem on some devices.

	// MemoryRemoteDocumentCache.java
	while (iterator.hasNext()) {
	...
	if (!query.matches(doc)) {
	continue;
	}

	result = result.insert(doc.getKey(), doc.clone());
	}

	// SQLiteRemoteDocumentCache.java
	sqlQuery.forEach(
	row -> {
	...
	if (document.isFoundDocument() && query.matches(document)) {
	...
	}
	});
	})

view raw DocumentCache.java hosted with ❤ by GitHub

RemoteStore

RemoteStore handles all interaction with the backend through a simple, clean interface.

RemoteStore is the class that handles streams to talk to the backend. It utilizes WatchStream to observe data and WriteStream to write data to the backend.

WatchStream contains watchQuery that’s used to tell the backend it wants to receive changes related to a given query.
WriteStream contains writeMutations to write all the changes that happened locally to the backend.

RemoteStore polls LocalStore to request the next MutationBatch that should be sent to the backend.

	public void fillWritePipeline() {
	...
	while (canAddToWritePipeline()) {
	MutationBatch batch = localStore.getNextMutationBatch(lastBatchIdRetrieved);
	...
	addToWritePipeline(batch);
	}
	...
	}

view raw AndroidConnectivityMonitor.java hosted with ❤ by GitHub

writePipeline is a Deque that queues MutationBatches that were sent and haven’t been acknowledged or will be sent to the server.

You can manually call disableNetwork or enableNetwork if you want to influence how RemoteStore works. By default, it’ll use a ConnectivityMonitor to detect the network status and handle streams accordingly.

AndroidConnectivityMonitor

Determining if a user has connectivity is a problem most of us have encountered throughout our careers. On Firebase Firestore the class responsible for dealing with that is AndroidConnectivityMonitor. I won’t go over how it’s done in this article but you check out the code here.

One interesting thing to note is that every time the app is foregrounded it checks for connectivity and calls all listeners in case it’s connected.

EventManager

EventManager is responsible for mapping queries to query event listeners. It handles “fan-out.” (Identical queries will re-use the same watch on the backend.)

Earlier I said that FirestoreClient#listen shares the same data source for identical queries, this is the class that handles that.

addQueryListener is used to register a new query listener, from now on the query will receive updates belonging to it.

	public int addQueryListener(QueryListener queryListener) {
	Query query = queryListener.getQuery();

	QueryListenersInfo queryInfo = queries.get(query);
	boolean firstListen = queryInfo == null;
	if (firstListen) {
	// QueryListenersInfo is only created if no identical query is registered
	queryInfo = new QueryListenersInfo();
	queries.put(query, queryInfo);
	}
	...
	}

view raw EventManager.java hosted with ❤ by GitHub

onViewSnapshots will be called when the OnlineState changes or when new data is available. I’ll dispatch the changes to all query listeners related to the a ViewSnapshot.

Photo by Markus Spiske on Unsplash

SyncEngine

SyncEngine is the central controller in the client SDK architecture.

SyncEngine is the piece that makes LocalStore, RemoteStore and EventManagerwork together.

When you can DocumentReference#set or DocumentReference#update the method that ends up getting called is SyncEngine#writeMutations.

	public void writeMutations(List<Mutation> mutations, TaskCompletionSource<Void> userTask) {
	// Write mutations locally using LocalStore
	LocalWriteResult result = localStore.writeLocally(mutations);
	...
	// Dispatch changes to EventManager who updates active query listeners
	emitNewSnapsAndNotifyLocalStore(result.getChanges(), /remoteEvent=/ null);
	// Tell RemoteStore there're mutations to be sent to the backend
	remoteStore.fillWritePipeline();
	}

view raw SyncEngine.java hosted with ❤ by GitHub

It contains handleCredentialChange that gets called when the authenticated user changes. When that happens:

LocalStore is notified that the user changed and a new LocalDocumentsView is created for the new user.
RemoteStore restarts its streams.

WatchChange

A Watch Change is the internal representation of the watcher API protocol buffers.

A WatchChange basically encapsulates what is returned by the backend.

	watchStream = datastore.createWatchStream(
	new WatchStream.Callback() {
	...
	@Override
	public void onWatchChange(SnapshotVersion snapshotVersion, WatchChange watchChange) {
	handleWatchChange(snapshotVersion, watchChange);
	}
	...
	})

view raw WatchChange.java hosted with ❤ by GitHub

As you can see, the watch stream only receives WatchChanges and it has 3 subclasses:

DocumentChange: Represents a document change.
ExistenceFilterWatchChange: Used to verify the client has the right number of documents locally. It contains an ExistenceFilter that has only one field: count.
WatchTargetChange: Used to update TargetStates.

WatchChangeAggregator

A WatchChangeAggregator is created every time a watch stream is started on RemoteStore. It receives the WatchChanges from RemoteStore and handles them. The easiest one to understand is the DocumentChange.

Whenever a new DocumentChange arrives, RemoteStore calls WatchChangeAggregator#handleDocumentChange. That causes the updated document to be added to pendingDocumentUpdates.

If the new document version is greater than the version that’s stored locally, RemoteStore will call WatchChangeAggregator#createRemoteEvent, a RemoteEventcontaining the documents that were added to pendingDocumentUpdates earlier will be created and dispatched to SyncEngine#handleRemoteEvent.

	@Override
	public void handleRemoteEvent(RemoteEvent event) {
	...
	ImmutableSortedMap<DocumentKey, Document> changes = localStore.applyRemoteEvent(event);
	emitNewSnapsAndNotifyLocalStore(changes, event);
	}

view raw WatchChangeAggregator.java hosted with ❤ by GitHub

SyncEngine will send the RemoteEvent to LocalStore causing it to update RemoteDocumentCache, that’s where the documents stay stored locally. SyncEnginewill also cause the queries listeners to update.

This was by no means a complete exploration of the library, there are dozens of topics I didn’t touch for lack of time. Now that you have a basic understanding of how things work, it should be easier for you to continue exploring. You can find the source code here.

If you enjoy learning how libraries work, take a look at my previous article explaining how Crashlytics works.

How does Crashlytics work? by Victor Brandalise

I hope you got to understand a little bit more how this amazing library works. If you have any questions or suggestions feel free to reach me on Twitter. See you in my next article.

YOU MAY BE INTERESTED IN

blog

Swipe to Dismiss — Jetpack Compose

Pankaj Rai

It’s one of the common UX across apps to provide swipe to dismiss so…

blog

Introducing Jetpack Compose into an existing project

Ziv Kesten

In this part of our series on introducing Jetpack Compose into an existing project,…

blog

An Investigation of Dependency Management Libraries for Kotlin Multiplatform Mobile: Koin

Pamela Hill

This is the second article in an article series that will discuss the dependency…

blog

How to create an APK from the Android App Bundle

Manuel Mato

Let’s suppose that for some reason we are interested in doing some tests with…