🛠️ mighty sync

Almost all small organizations have similar records spread out across many systems, for example:

These are hard to keep in sync, and so, duplicates are common and information gets lost ("Jane Smith" now goes by "Jane Doe-Smith", "Jonathan Doe" used "Jon" in some forms, Alex Smith used different emails when submitting different forms). Unless the organization is a tech company, or has dedicated technical staff, or hires someone to manually "fix things"... the data stays a mess.

Mighty Sync is an application that aims to solve this problem, by keeping any two sources of similar data "in sync" (and as a bonus, finding duplicates).

"In the biz", what Mighty Sync does is related to Record Linkage and Master Data Management.

There are several proprietary enterprise solutions available (ie. complicated and expensive), and a good open-source alternative that requires someone who knows Python (although, maybe anyone with access to ChatGPT or Claude is enough these days).

However, Mighty Sync is designed to do a few things the existing tools didn't do:

  1. integrate directly with various web applications (ex. Salesforce and Mailchimp), including fetch and updating data directly (most existing tools just deal with CSV exports)
  2. deal with data in various shapes (ex. CSV vs. JSON)
  3. automatically discover how the fields in the various systems are related (ex. "Full Name" in Mailchimp is "First Name" + "Last Name" in Salesforce)

In it's current state, Might Sync supports several integrations (Salesforce, Mailchimp, CSVs, JSON files, Google Spreadsheets, Airtable), though it still requires manual set up (the UI for a user to connect the services hasn't been written yet). It uses similar fancy statistical approaches as the tools mentioned above to solve the linkage issue. I use it fairly regularly with one of my clients.

Mighty Sync isn't available publicly yet, but if you're reading this and have the problems that Mighty Sync tries to solve, please reach out and I can configure a private instance of the tool for you to try.

History

As with most projects, I'd been thinking about this for a while before eventually deciding to do something about it. This sort of chaos hurts my soul, and I am compelled to tame it.

I had been dealing with the mess described above at Next Canada; in particular, they had their alumni information in a whole bunch of places (Salesforce, on a public directory on their website, on Mailchimp, and in uncountable spreadsheets and Airtables), each with different information about the alumni and at different levels of "up-to-date-ness". They used no primary identifier, relying on names and emails to cross-reference.

Over the years, I wrote several one-off sync scripts to update one source based on another; but these scripts were flaky and not reusable.

...and when I started to think about how to make them more robust and reusable, well, it became this project.

Behind-the-scenes, the code base features some of the fanciest statistics we've done at Bloom to date, lots of performance-optimized code, some fancy data live-streaming from the back-end to the front-end, some ingenious use of the Specter library, and some fun UI stuff.

...now if only someone else used it.

2023-09-19
mighty sync
:project-started-on2019-05-02
:project-updated-on2023-09-19
:post-created-on2025-08-25
:post-updated-on2025-09-23