Spring Cleaning: Tidying up your codebase

As the weather warms up, I get energized to give my codebase a spring cleaning. I’ve worked on projects where the mess was so bad that we were afraid to touch anything and I’ve worked on projects where I aggressively tried to get to code neutral (deleting more lines of code than I added). But the right balance lies somewhere in the middle.

Over the years, I’ve found a few tasks that I think provide the biggest bang for the buck. Easy, low risk things you can do in under an hour to make your codebase a little bit more inhabitable. Do one every day for a week, or go all-out one Friday afternoon and see how much you can finish.

Tidy Up Your Dependencies

Tools like bundler and yarn generate a bunch of files and folders for your project, but most of the mess is “out of sight, out of mind” as they shuffle things under the proverbial rug of node_modules.

And we’re not even talking about unused dependencies, just old versions of libraries sitting around since you’ve upgraded or tried out new tools in a branch.

Bundler: Run bundle clean to remove unused gems from your bundler directory.

➜ bundle clean --dry-run
Would have removed rails-html-sanitizer (1.2.0)
Would have removed jekyll-sitemap (1.3.1)
Would have removed que (0.14.3)
Would have removed aws-sdk-kms (1.28.0)
Would have removed actioncable (6.0.2.1)
Would have removed rake (12.3.2)
Would have removed administrate (0.12.0)
Would have removed factory_bot (5.1.0)
Would have removed sass (3.7.4)
Would have removed html-pipeline (2.12.3)
Would have removed minitest (5.11.3)
Would have removed uniform_notifier (1.12.1)
Would have removed parallel (1.19.1)
Would have removed colorator (1.1.0)

One caveat is that if you use system-level gems and have multiple Ruby projects, running bundle clean will try to remove global gems not used in your current project. This is probably not what you want.

To avoid this, switch to using per-project bundles. You can do bundle install --path vendor/bundle to install gems to a project-specific folder and then run bundle clean to remove unused gems for the project and not mess with gems for other projects.

Tip: You can disable downloading gem documentation locally by adding gem: --no-document to ~/.gemrc

Yarn: node_modules has a bad reputation for ballooning in size. As the nested folders get larger and larger, you can even start running into OS level limitations when trying to delete this massive pile of JavaScript.

Run yarn autoclean --init to generate a template file that slims down your node_modules folder by removing cruft like test files, markdown files, and other miscellaneous junk that sneaks into the published packages.

After adding a .yarnclean file, the scrubbing process will run every time you add or install packages (and you can also run it manually).

➜ yarn autoclean --init
yarn autoclean v1.17.0
[1/1] Creating ".yarnclean"...
info Created ".yarnclean". Please review the contents of this file then run "yarn autoclean --force" to perform a clean.

➜ yarn autoclean --force
yarn autoclean v1.17.0
[1/1] Cleaning modules...
info Removed 4799 files
info Saved 17.75 MB.
✨  Done in 4.37s.

Tip: Yarn automatically prunes extraneous packages whenever you run the install command so no need to do it yourself.

If you’re really feeling ambitious, audit your dependencies to see if any can be removed. Mike Perham’s excellent Kill Your Dependencies article has a checklist to use when evaluating external libraries:

Every dependency in your application has the potential to bloat your app, to destabilize your app, to inject odd behavior via monkeypatching or buggy native code. When you are considering adding a dependency to your Rails app, it’s a good idea to do a quick sanity check, in order of preference:

Do I really need this at all? Kill it. [Uninstall the gem]

Can I implement the required minimal functionality myself? Own it. [Copy/vendor the code] If you need a gem:

Does the gem have a native extension? Look for pure ruby alternatives. [Switch gems]

Does the gem transitively pull in a lot of other gems? Look for simpler alternatives. [Switch gems]

Gems with native extensions can destabilize your system; they can be the source of mysterious bugs and crashes. Avoid gems which pull in more dependencies than their value warrants. Example of a bad gem: the fog gem which pulls in 39 gems, more dependencies than rails itself and most of which are unnecessary.

Prune your git branches

If running git branch -r fills your terminal with the ghosts of features past, you should clean things up! Keeping your branches tidy makes it easy to see which branches are open without scrolling through tens (or hundreds!) of lines.

Run git remote prune origin to delete local tracking branches that don’t exist on origin. You can use --dry-run first if you’re worried.

If you run git branch -r again, you should see a slimmer list that only shows the branches that exist in GitHub.

You can then either go to GitHub directly or use git ls-remote --heads origin to list the current remote branches. If there are any to remove, delete them by running git push origin -D BRANCH_TO_DELETE.

Automatically delete head branches

Tip: Turn on “Automatically delete head branches” in GitHub so your branches don’t hang around after you’ve merged a pull request

Remove unused routes and views

While the Rails generators spark joy for me, they often generate more routes and views than you actually want. Apply your inner Marie Kondo and excise these unused parts of your app.

First try searching your app for “Find me in”: this is the default generator text for some views that Rails has generated. If you have any results, congratulations! You can delete these files without impunity.

Next up: check your routes.rb file for unused routes by looking for controller actions that aren’t implemented (or are missing views).

I found the best results from this gist. Drop the script into the root of your Rails app and run it to see a list of unused routes to potentially clean up.

Checking for unused database tables and columns

A final area to check is your database, which may have accumulated unused tables or columns. One clever approach is to look for empty tables and columns that have the same value for every row. These are suspicious places to investigate.

Paste this script (adapted from this article) into your project to scan your database (you may want to run against a staging database if your development database does not have representative data):

require_relative './config/environment.rb'

connection = ActiveRecord::Base.connection
connection.tables.collect do |t|
  count = connection.select_all("SELECT count(1) as count FROM #{t}", "Count").first['count']

  puts "TABLE UNUSED #{t}" if count.to_i == 0

  columns = connection.columns(t).collect(&:name).reject {|x| x == 'id' }
  columns.each do |column|
    values = connection.select_all("SELECT DISTINCT(#{column}) AS val FROM #{t} LIMIT 2", "Distinct Check")
    if values.count == 1
      if values.first['val'].nil?
        puts "COLUMN UNUSED #{t}:#{column}"
      else
        puts "COLUMN SINGLE VALUE #{t}:#{column} -- #{values.first['val']}"
      end
    end
  end
end

➜ ruby unused_db.rb
TABLE UNUSED active_storage_blobs
TABLE UNUSED friendly_id_slugs
TABLE UNUSED active_storage_attachments
COLUMN SINGLE VALUE investor_agreements:file_extension -- pdf
COLUMN UNUSED clients:comments
COLUMN SINGLE VALUE client_tasks:archived -- false
...

In additional to some unused generated tables, the script also found dozens of columns with all null values or all single values (e.g. an archived column that is always “false”). These columns require more investigation to remove, but are a good list to start.

Check for missing validations / constraints

While ActiveRecord provides a validation layer on top of your database, you’ll still want the strong protection of database constraints (non-nullable columns, unique indices, etc) to make sure no bad data sneaks into your app.

It’s really easy for your models and underlying database to get out-of-sync. Use the database_consistency gem to run a series of checks to tell you where your application models and database schema are out of sync.

➜ bundle exec database_consistency
fail ProjectType name column is required in the database but do not have presence validator
fail ProjectType slug column is required in the database but do not have presence validator
fail Project draft_documents associated model should have proper index in the database
fail Project attachments associated model should have proper index in the database
fail Project alerts associated model should have proper index in the database
fail Company name column should be required in the database
fail PurchaseApproval date column should be required in the database
...

You may be overwhelmed by the volume of errors this tool spits out for a project, but don’t fret, every change should be straightforward and you can handle them bit by bit. Once you resolve everything, consider adding this command to your Rails CI pipeline in order to catch future errors right when the offending code is committed.

Clear out your migrations folder

The final place to check is your migrations. Over time, your db/migrate folder will build up to hundreds of migration files. If you’ve already run these migrations on production environments (and if you’re avoiding creating seed data directly in migrations), you can safely delete older migrations.

I like the approach outlined by Clutter:

Delete migrations older than 3 months
Add a new migration that raises if your database was very out-of-date (with instructions on how to load the schema)

If deleting the migrations makes you uneasy, you can also look at squasher: a tool that combines your old migrations into one mega-migration.

Deep cleaning: going even further

If you’re ready to deep clean, consider trying out these tools to help you dig further into your application code:

unused: a general-purpose tool for finding dead code and is based on ctags
debride: a Ruby tool to find potentially uncalled methods (with some specific Rails checks)
coverband: a “run in production” coverage tool that collects stats on method usage; it’s the most thorough but has a longer turn around time as you need to run it for a while against real traffic
rcov: depending on your test suite, you can get a good picture of what parts of your application may not be used
attractor: metrics visualizer for plotting churn (how often code changes) vs complexity

I haven’t had as much success with these tools and generally don’t find them to be a good ROI, but if you’re looking to keep your codebase sparkling clean, it’s worth spending some time exploring. Personally, I’m okay with a little bit of dust :)

Happy cleaning! If you’ve got any other tips that you’ve used on your projects, let me know on Twitter.

Was this article valuable? Subscribe to the low-volume, high-signal newsletter. No spam. All killer, no filler.