Site icon Obviously Wrong

Using Data Science To Name Your Baby

So, we just had a baby girl a few days ago! (Yay! Mom and baby are both healthy and happy.) I’m obviously excited to be a father for the first time, but… I figure at least 70% of that excitement is about cool projects I can do to with her.

Project #1: What’s Going To Be Her Name?!

Turns out it’s harder than you’d think to name a (mixed-race) baby. Do you do a Jewish first name and Indian middle name? What about spellability and pronounceability? Go 100% American? Try finding something that’s profound in both Sanskrit and Hebrew? Oof.

One day, as my wife and I were sitting in Los Angeles trying to brainstorm names for what felt like the millionth time, I had an idea — couldn’t you use data science to do this? (!)

Well, obvs.

After a bit of negotiating, my wife and I agreed on a few criteria:

That shouldn’t be too hard to prototype, right? I spent a day or two wrangling datasets, pandas and soundex. It ended up being at least a little helpful, so I spent another day making it publicly available.


Without further ado, here’s NameBlender! [*] Wondering about that melanin-enhanced cutie across the room? Or that impossibly adorable redhead sitting next to you? Give it a whirl — see what your hypothetical future genetic collisions could be named!

[*] Given the prototype nature, there might be a few false positives. Specifically, it looks like there’s noise from mixed-race marriages that I haven’t had the chance to filter out. Be gentle.

Here are 100+ names for girls with Indian and Jewish roots that are 7 letters or less, sorted by American popularity, ethnic uniqueness and phonetic similarity:

Baby Lia!

And the ultimate demo? Here’s Lia, our new baby girl:


(Skip unless you’re looking for a late-night cure to your insomnia. Or if you’re a data nerd.)

First, I downloaded 90.2 million political contributions from Stanford’s DIME dataset between 2008 to 2014 and scraped Family Education’s list of surnames by ethnicity. After removing corporations and deduping multiple contributions, I then classified 13.5 million unique donors by ethnicity using uniquely-ethnic surnames. Finally, after extracting first names that appeared N≥3 times in the dataset, I used fuzzy soundex and metaphone to cluster together names that sounded similar in different languages. All of this was written with python, pandas and pyphonetics and served using flask.

Exit mobile version