As an Android developer, I’m interested in tracking two types of files: Kotlin and Java. As we continue writing all of our new code in Kotlin and occasionally removing old Java code, I want to continuously monitor how much of our codebase is in Kotlin and how much is in Java. The language stats in our Github repo makes this extremely easy. The problem is that we have all kinds of other files. We have Ruby files for Fastlane, xml resource files, html, css, and Javascript asset files, and even some Python scripts that we keep in the same repo. These generate a lot of noise in our Github language stats. Thankfully Github’s Linguist tool provides a way to filter out these other files in our language stats so I can see the files that actually matter.
Step 1. Create a .gitattributes
file at the root of your repo.
Github uses a tool called Linguist to detect the languages you used based on file extensions in your repo. We can tell Linguist to exclude certain files from its calculations by configuring a .gitattributes
file in our repo.
Step 2. Add filetypes to exclude.
In your .gitattributes
file, add a linguist-vendored
entry for each filetype you want to exclude. The linguist-vendored
attribute tells Linguist to ignore these filetypes. For example, to exclude all .html
files you would add the following line in your .gitattributes
file:
*.html linguist-vendored
Note that this is purely based on file extension, not language. So if you had some .html
file and some .htm
files, you would need to add both entries:
*.html linguist-vendored *.htm linguist-vendored
Step 3. Add specific files to exclude.
Linguist also supports excluding specific files. For example, many Android/iOS repos use Fastlane, which adds a Gemfile to the repo root. To exclude this specific file, as well as the Gemfile.lock, add the following lines to your .gitattributes
file:
Gemfile linguist-vendored Gemfile.lock linguist-vendored
Step 4. Add directories to exclude.
Linguist also allows us to exclude entire directories. Following on the previous example, let’s say you want to exclude all other Fastlane files from the language stats. Since all of these are in a fastlane
directory at the root of the project, the exclude looks like this:
fastlane/* linguist-vendored
For iOS projects using CocoaPods, you can exclude the Pods directory in the same way:
Pods/* linguist-vendored
Conclusion
You can now exclude specific filetypes, specific files, or specific directories. This is extremely helpful for mobile apps trying to track Kotlin/Java or Swift/Obj-C percentages.
Three final notes on Linguist. First, Linguist always uses the default branch. This is unfortunate since it means you can’t test your .gitattributes
file in a separate branch before merging to your default. Also note this is not necessarily your mainbranch but rather your default branch, which can be configured in your Github repo settings. For many repos develop
will be the default branch.
Second, Linguist takes a bit of time to run after merging your PR. If you merge your new .gitattributes
file to your default branch and don’t see the language stats change right away, give it some time. Anecdotally, it can take about 30 minutes to an hour for the stats to update. And of course make sure you aren’t getting cached results in your browser.
Third, Linguist struggles sometimes with Obj-C files. If you find that Github is classifying your Obj-C files incorrectly, you can edit your .gitattributes
to force an override. For example, you can force all .m
and .h
files to be considered as Obj-C with the following .gitattributes
entry:
*.h linguist-language=Objective-C *.m linguist-language=Objective-C
And that’s it! Now you can get that pure Kotlin/Java or Swift/Obj-C comparison that you want.
Follow for more on best practices in Kotlin and Android development.
This article was originally published on proandroiddev.com on March 03, 2022