Cloud Firestore is a popular database for mobile and web applications. According to its documentation:
[…] It keeps your data in sync across client apps through realtime listeners and offers offline support for mobile and web so you can build responsive apps that work regardless of network latency or Internet connectivity.
Continuing the “How does X work” series, today we’re gonna explore how Firestore works under the hood. How do listeners work? How does it send/receive data from the backend? How does it keep things stored locally? Those are some of the questions we’ll be exploring today.
FirebaseApp
FirebaseInitProvider
will handle the initialization of Firebase for the default project that it’s set to operate with using the data in the app’s google-services.json
file. When building using Gradle, this ContentProvider
is automatically integrated into the app’s manifest and executed when the app is launched.
If an app needs access to another Firebase project in addition to the default project, use initializeApp(Context, FirebaseOptions, String)
to do that.
FirebaseFirestore
Represents a single Cloud Firestore database. It’s probably the most used class as it acts as a facade to other classes. It provides methods such as setFirestoreSettings
, collection
, document
, runTransaction
, runBatch
, waitForPendingWrites
, enableNetwork
/ disableNetwork
, clearPersistence
, etc.
FirebaseFirestoreSettings
Specifies the configuration for your Firestore instance. The configurable values are host
, sslEnabled
, persistenceEnabled
and cacheSizeBytes
.
Data Bundles
Data bundles are serialized collections of documents.
These data bundles can be saved to a CDN or another object storage provider, and then loaded from your client applications. By doing that, you can avoid making extra calls against the Firestore database.
You can read more about Data Bundles here.
FirestoreMultiDbComponent
Even though most of us only use one database instance, Firestore supports multiple instances. FirestoreMultiDbComponent
is basically a container that stores references to all available databases.
class FirestoreMultiDbComponent { | |
private final Map<String, FirebaseFirestore> instances = new HashMap<>(); | |
} |
When you call FirebaseFirestore.getInstance()
, it’s actually calling FirestoreMultiDbComponent
with the default database id.
@NonNull | |
public static FirebaseFirestore getInstance() { | |
FirebaseApp app = FirebaseApp.getInstance(); | |
... | |
return getInstance(app, DatabaseId.DEFAULT_DATABASE_ID); | |
} |
Here’s the method that returns a database given its id. It’s also responsible for creating a FirebaseFirestore
if there’s none registered.
@NonNull | |
synchronized FirebaseFirestore get(@NonNull String databaseId) { | |
FirebaseFirestore firestore = instances.get(databaseId); | |
if (firestore == null) { | |
firestore = FirebaseFirestore.newInstance(...); | |
instances.put(databaseId, firestore); | |
} | |
return firestore; | |
} |
FirestoreClient
It looks very similar to FirebaseFirestore
but here you start to see some real work being done.
It’s also responsible for creating the Datastore
, the class that represents the connection to Firebase Firestore’s server.
It contains waitForPendingWrites
that returns a task that resolves when all the pending writes at the time when this method is called received server acknowledgment.
It also contains write
. According to its documentation:
Writes mutations. The returned task will be notified when it’s written to the backend.
There’s an interesting thing to note: write
will behave differently based on the user’s connectivity. That might be the behavior most people expect but you need to remember that Cloud Firestore can also be used as an offline database. If your app allows users to use it offline, you’ll have to take that into consideration.
Most people have never heard of listen(Query, ListenOptions, EventListener<ViewSnapshot>)
but you’ve probably used DocumentReference.addSnapshotListener
or Query.addSnapshotListener
, that’s the method they call to listen for changes.
listen
shares the same data source for the equal queries, so calling DocumentReference.addSnapshotListener
from multiple places using the same query is not costly.
To accomplish its purpose, listen
relies mostly on QueryListener
.
QueryListener
As you may know, Firestore is a reactive database meaning that when somebody updates a document you also receive the update if you’re listening for it.
QueryListener
is one of the classes responsible for that.
Before we understand how QueryListener
works we need to what learn ViewSnapshot
is.
ViewSnapshot
A view snapshot is an immutable capture of the results of a query and the changes to them.
Whenever you query for something you don’t get the actual values you queried for. Firestore returns a QuerySnapshot
and inside it you can find a ViewSnapshot
that contains the values your queried for. Take a look at QuerySnapshot#getDocuments.
@NonNull | |
public List<DocumentSnapshot> getDocuments() { | |
List<DocumentSnapshot> res = new ArrayList<>(snapshot.getDocuments().size()); | |
for (Document doc : snapshot.getDocuments()) { | |
res.add(convertDocument(doc)); | |
} | |
return res; | |
} |
ViewSnapshot
is basically a data class, it holds together a lot of related data. It contains reference to the Query
that generated it, the old documents, the new documents, a list of document changes, if the values are from cache, etc.
Now let’s get back to QueryListener
. According to the documentation:
QueryListener takes a series of internal view snapshots and determines when to raise events.
Why does it say “determines when to raise events”, don’t all changes raise events? Well, not necessarily. When you start listening for a query you can give it a MetadataChanges
, that’s going to define what kind of changes you’ll be notified of. You have two options: INCLUDE
and EXCLUDE
.
Currently document snapshots have two metadata properties hasPendingWrites
and isFromCache
.
If you specify MetadataChanges.INCLUDE
, you’ll also be notified when any these two fields change. Let’s suppose you’re listening to collection C and a user who has no connectivity writes document D to collection C. Initially hasPendingWrites
will be true because this data has not been written to the backend yet. When the document is uploaded the data in your document is probably not going to change but hasPendingWrites
will become false and you’ll receive an update for that.
What QueryListener
does is basically wait for new view snapshots and decide whether or not it should notify you of a change based on these options.
Mutation
Represents a Mutation of a document.
Mutation is exactly what you think it is, it’s the change of something. A mutation of a document is one or more changes in a document. A change can mean setting or removing something. There are 3 main mutations:
- DeleteMutation — represents that a document was deleted
- SetMutation — represents that a whole document was created or changed
- PatchMutation — represents that some fields in a document were created or changed
Mutation also includes the field transformation operations such as array union, array remove, increment value, and server timestamp.
Why are Mutations needed? Can’t Firestore simply change the document and store that? Let’s see a few reasons why it’s not so simple.
First, Firestore works offline, if we’re both offline, I modify one field and you modify another field, when we get online we expect the document to have both changes and that is more complex if we only store the mutated document.
Second, field transformations such as array union don’t have a reference to the whole document so Firestore needs a way to represent that transformation for it to be applied on the server.
Job Offers
MutationQueue
A queue of mutations to apply to the remote store.
Whenever you create, update or delete a document, a mutation is created. They are submitted individually or in group in case you’re using a WriteBatch
to MutationQueue
. That creates a MutationBatch
, it’s simply a collection of mutations that will be sent to the server together.
Mutations remain in the MutationQueue
until removeMutationBatch
is called.
MutationQueue
is an interface and like many other classes such as Persistence
, ContentProvider
, IndexManager
, etc it has 2 implementations. A memory implementation and a SQLite implementation.
If you set FirebaseFirestoreSettings.persistenceEnabled
to true, the SQLite implementation of these classes will be used to persist the changes locally.
The memory and SQLite implementations are very similar, the main difference is where their data comes from. Let’s see how these classes handle MutationQueue#isEmpty
by looking at a reduced code version:
class MemoryMutationQueue { | |
private final List<MutationBatch> queue; | |
@Override | |
public boolean isEmpty() { | |
return queue.isEmpty(); | |
} | |
} | |
class SQLiteMutationQueue { | |
private final SQLitePersistence db; | |
@Override | |
public boolean isEmpty() { | |
return db.query("SELECT batch_id FROM mutations WHERE uid = ? LIMIT 1") | |
.binding(uid) | |
.isEmpty(); | |
} | |
} |
The reason the SQLite implementation filters by uid(user id) is that the same database is shared among multiple users but the memory implementation is instantiated by a user.
Persistence
Persistence is the lowest-level shared interface to persistent storage in Firestore.
What does that mean? By “lowest-level shared interface” it’s talking about shared between memory and SQLite. All the methods below have a memory and a SQLite implementation.
getMutationQueue
returns a differentMemoryMutationQueue
instance by user when using the memory implementation and a newSQLiteMutationQueue
that shares the database when using the SQLite implementation.runTransaction
uses the native transition mechanism provided by SQL databases when using SQLite andReferenceDelegate
when dealing with the in memory implementation.- and so on…
abstract MutationQueue getMutationQueue(User user); | |
abstract TargetCache getTargetCache(); | |
abstract RemoteDocumentCache getRemoteDocumentCache(); | |
abstract IndexManager getIndexManager(); | |
abstract BundleCache getBundleCache(); | |
abstract DocumentOverlay getDocumentOverlay(User user); | |
abstract void runTransaction(String action, Runnable operation); | |
// other methods.. |
A lot of code is simplified by this interface that can be used to talk to in memory and SQLite implementations without having to use 2 different classes.
LocalStore
LocalStore
is a final class, the same implementation is used whether or not persistence is enabled.
Just because you have persistence disabled doesn’t mean Firestore doesn’t keep things locally. When persistence is disabled those things are kept in memory, when it’s enabled they are kept in SQLite.
Imagine you fetched document A from Firestore, you update the document and now you have A’, do you think A’ is really stored locally? Take a look at the documentation provided by Firestore:
The local store provides the local version of documents that have been modified locally. It maintains the constraint: LocalDocument = RemoteDocument + Active(LocalMutations)
The only things that are stored locally(either in memory or in SQLite) are RemoteDocuments
and Mutations
, the A’ document you have is a combination of those 2 things. Here’s how Firestore does that:
- You call
FirebaseFirestore#document
to get aDocumentReference
. - You call
DocumentReference#get(Source.CACHE)
to force Firestore to return the document available locally. DocumentReference
callsFirestoreClient#getDocumentFromLocalCache
who callsLocalStore#readDocument
.LocalStore
callsLocalDocumentsView#getDocument
.
// LocalDocumentsView.java | |
Document getDocument(DocumentKey key) { | |
// 1. Get all mutations for the given document | |
List<MutationBatch> batches = mutationQueue.getAllMutationBatchesAffectingDocumentKey(key); | |
return getDocument(key, batches); | |
} | |
private Document getDocument(DocumentKey key, List<MutationBatch> inBatches) { | |
// 2. Fetch the remote version of the given document from the local cache | |
MutableDocument document = remoteDocumentCache.get(key); | |
for (MutationBatch batch : inBatches) { | |
// 3. Apply all mutations to the remote document | |
batch.applyToLocalView(document); | |
} | |
return document; // = RemoteDocument + Active(LocalMutations) | |
} |
As you can see Firestore never stores the changed version of a document locally, it only stores what’s on the server( RemoteDocumentCache
) and the local mutations( MutationQueue
), those two things are enough to create the changed document you’re expecting.
LocalStore
also contains configureIndices(List<FieldIndex>)
. When you query a lot of data indexes become essential for maintaining good performance. Locally Firestore creates a simplified version of an index to speed up some queries. To create those indexes it uses IndexManager
that contains two implementations:
MemoryIndexManager
: only supports collection parent indexing, that’s used when doing collection group queries.SQLiteIndexManager
: supports both collection parent and document field indexing.
Even though document field indexing is supported in SQLite, it appears it’s never used 🤷♂️.
Be aware that when a collection query is executed locally, it always iterates through all available documents, if you have a huge amount of documents that can become a problem on some devices.
// MemoryRemoteDocumentCache.java | |
while (iterator.hasNext()) { | |
... | |
if (!query.matches(doc)) { | |
continue; | |
} | |
result = result.insert(doc.getKey(), doc.clone()); | |
} | |
// SQLiteRemoteDocumentCache.java | |
sqlQuery.forEach( | |
row -> { | |
... | |
if (document.isFoundDocument() && query.matches(document)) { | |
... | |
} | |
}); | |
}) |
RemoteStore
RemoteStore handles all interaction with the backend through a simple, clean interface.
RemoteStore
is the class that handles streams to talk to the backend. It utilizes WatchStream
to observe data and WriteStream
to write data to the backend.
WatchStream
containswatchQuery
that’s used to tell the backend it wants to receive changes related to a given query.WriteStream
containswriteMutations
to write all the changes that happened locally to the backend.
RemoteStore
polls LocalStore
to request the next MutationBatch
that should be sent to the backend.
public void fillWritePipeline() { | |
... | |
while (canAddToWritePipeline()) { | |
MutationBatch batch = localStore.getNextMutationBatch(lastBatchIdRetrieved); | |
... | |
addToWritePipeline(batch); | |
} | |
... | |
} |
writePipeline
is a Deque
that queues MutationBatches
that were sent and haven’t been acknowledged or will be sent to the server.
You can manually call disableNetwork
or enableNetwork
if you want to influence how RemoteStore
works. By default, it’ll use a ConnectivityMonitor
to detect the network status and handle streams accordingly.
AndroidConnectivityMonitor
Determining if a user has connectivity is a problem most of us have encountered throughout our careers. On Firebase Firestore the class responsible for dealing with that is AndroidConnectivityMonitor
. I won’t go over how it’s done in this article but you check out the code here.
One interesting thing to note is that every time the app is foregrounded it checks for connectivity and calls all listeners in case it’s connected.
EventManager
EventManager is responsible for mapping queries to query event listeners. It handles “fan-out.” (Identical queries will re-use the same watch on the backend.)
Earlier I said that FirestoreClient#listen
shares the same data source for identical queries, this is the class that handles that.
addQueryListener
is used to register a new query listener, from now on the query will receive updates belonging to it.
public int addQueryListener(QueryListener queryListener) { | |
Query query = queryListener.getQuery(); | |
QueryListenersInfo queryInfo = queries.get(query); | |
boolean firstListen = queryInfo == null; | |
if (firstListen) { | |
// QueryListenersInfo is only created if no identical query is registered | |
queryInfo = new QueryListenersInfo(); | |
queries.put(query, queryInfo); | |
} | |
... | |
} |
onViewSnapshots
will be called when the OnlineState
changes or when new data is available. I’ll dispatch the changes to all query listeners related to the a ViewSnapshot
.
Photo by Markus Spiske on Unsplash
SyncEngine
SyncEngine is the central controller in the client SDK architecture.
SyncEngine
is the piece that makes LocalStore
, RemoteStore
and EventManager
work together.
When you can DocumentReference#set
or DocumentReference#update
the method that ends up getting called is SyncEngine#writeMutations
.
public void writeMutations(List<Mutation> mutations, TaskCompletionSource<Void> userTask) { | |
// Write mutations locally using LocalStore | |
LocalWriteResult result = localStore.writeLocally(mutations); | |
... | |
// Dispatch changes to EventManager who updates active query listeners | |
emitNewSnapsAndNotifyLocalStore(result.getChanges(), /*remoteEvent=*/ null); | |
// Tell RemoteStore there're mutations to be sent to the backend | |
remoteStore.fillWritePipeline(); | |
} |
It contains handleCredentialChange
that gets called when the authenticated user changes. When that happens:
LocalStore
is notified that the user changed and a newLocalDocumentsView
is created for the new user.RemoteStore
restarts its streams.
WatchChange
A Watch Change is the internal representation of the watcher API protocol buffers.
A WatchChange
basically encapsulates what is returned by the backend.
watchStream = datastore.createWatchStream( | |
new WatchStream.Callback() { | |
... | |
@Override | |
public void onWatchChange(SnapshotVersion snapshotVersion, WatchChange watchChange) { | |
handleWatchChange(snapshotVersion, watchChange); | |
} | |
... | |
}) |
As you can see, the watch stream only receives WatchChanges
and it has 3 subclasses:
DocumentChange
: Represents a document change.ExistenceFilterWatchChange
: Used to verify the client has the right number of documents locally. It contains anExistenceFilter
that has only one field:count
.WatchTargetChange
: Used to updateTargetStates
.
WatchChangeAggregator
A WatchChangeAggregator
is created every time a watch stream is started on RemoteStore
. It receives the WatchChanges
from RemoteStore
and handles them. The easiest one to understand is the DocumentChange
.
Whenever a new DocumentChange
arrives, RemoteStore
calls WatchChangeAggregator#handleDocumentChange
. That causes the updated document to be added to pendingDocumentUpdates
.
If the new document version is greater than the version that’s stored locally, RemoteStore
will call WatchChangeAggregator#createRemoteEvent
, a RemoteEvent
containing the documents that were added to pendingDocumentUpdates
earlier will be created and dispatched to SyncEngine#handleRemoteEvent
.
@Override | |
public void handleRemoteEvent(RemoteEvent event) { | |
... | |
ImmutableSortedMap<DocumentKey, Document> changes = localStore.applyRemoteEvent(event); | |
emitNewSnapsAndNotifyLocalStore(changes, event); | |
} |
SyncEngine
will send the RemoteEvent
to LocalStore
causing it to update RemoteDocumentCache
, that’s where the documents stay stored locally. SyncEngine
will also cause the queries listeners to update.
This was by no means a complete exploration of the library, there are dozens of topics I didn’t touch for lack of time. Now that you have a basic understanding of how things work, it should be easier for you to continue exploring. You can find the source code here.
If you enjoy learning how libraries work, take a look at my previous article explaining how Crashlytics works.
How does Crashlytics work? by Victor Brandalise
I hope you got to understand a little bit more how this amazing library works. If you have any questions or suggestions feel free to reach me on Twitter. See you in my next article.