Khan Engineering

Khan Engineering

We're the engineers behind Khan Academy. We're building a free, world-class education for anyone, anywhere.


Latest posts

Making Websites Work with Windows High Contrast Mode

Diedra Rater on March 21

Kotlin for Python developers

Aasmund Eldhuset on Nov 29, 2018

Using static analysis in Python, JavaScript and more to make your system safer

Kevin Dangoor on Jul 26, 2018

Kotlin on the server at Khan Academy

Colin Fuller on Jun 28, 2018

The Original Serverless Architecture is Still Here

Kevin Dangoor on May 31, 2018

What do software architects at Khan Academy do?

Kevin Dangoor on May 14, 2018

New data pipeline management platform at Khan Academy

Ragini Gupta on Apr 30, 2018

Untangling our Python Code

Carter J. Bastian on Apr 16, 2018

Slicker: A Tool for Moving Things in Python

Ben Kraft on Apr 2, 2018

The Great Python Refactor of 2017 And Also 2018

Craig Silverstein on Mar 19, 2018

Working Remotely

Scott Grant on Oct 2, 2017

Tips for giving your first code reviews

Hannah Blumberg on Sep 18, 2017

Let's Reduce! A Gentle Introduction to Javascript's Reduce Method

Josh Comeau on Jul 10, 2017

Creating Query Components with Apollo

Brian Genisio on Jun 12, 2017

Migrating to a Mobile Monorepo for React Native

Jared Forsyth on May 29, 2017

Memcached-Backed Content Infrastructure

Ben Kraft on May 15, 2017

Profiling App Engine Memcached

Ben Kraft on May 1, 2017

App Engine Flex Language Shootout

Amos Latteier on Apr 17, 2017

What's New in OSS at Khan Academy

Brian Genisio on Apr 3, 2017

Automating App Store Screenshots

Bryan Clark on Mar 27, 2017

It's Okay to Break Things: Reflections on Khan Academy's Healthy Hackathon

Kimerie Green on Mar 6, 2017

Interning at Khan Academy: from student to intern

Shadaj Laddad on Dec 12, 2016

Prototyping with Framer

Nick Breen on Oct 3, 2016

Evolving our content infrastructure

William Chargin on Sep 19, 2016

Building a Really, Really Small Android App

Charlie Marsh on Aug 22, 2016

A Case for Time Tracking: Data Driven Time-Management

Oliver Northwood on Aug 8, 2016

Time Management at Khan Academy

Several Authors on Jul 25, 2016

Hackathons Can Be Healthy

Tom Yedwab on Jul 11, 2016

Ensuring transaction-safety in Google App Engine

Craig Silverstein on Jun 27, 2016

The User Write Lock: an Alternative to Transactions for Google App Engine

Craig Silverstein on Jun 20, 2016

Khan Academy's Engineering Principles

Ben Kamens on Jun 6, 2016

Minimizing the length of regular expressions, in practice

Craig Silverstein on May 23, 2016

Introducing SwiftTweaks

Bryan Clark on May 9, 2016

The Autonomous Dumbledore

Evy Kassirer on Apr 25, 2016

Engineering career development at Khan Academy

Ben Eater on Apr 11, 2016

Inline CSS at Khan Academy: Aphrodite

Jamie Wong on Mar 29, 2016

Starting Android at Khan Academy

Ben Komalo on Feb 29, 2016

Automating Highly Similar Translations

Kevin Barabash on Feb 15, 2016

The weekly snippet-server: open-sourced

Craig Silverstein on Feb 1, 2016

Stories from our latest intern class

2015 Interns on Dec 21, 2015

Kanbanning the LearnStorm Dev Process

Kevin Dangoor on Dec 7, 2015

Forgo JS packaging? Not so fast

Craig Silverstein on Nov 23, 2015

Switching to Slack

Benjamin Pollack on Nov 9, 2015

Receiving feedback as an intern at Khan Academy

David Wang on Oct 26, 2015

Schrödinger's deploys no more: how we update translations

Chelsea Voss on Oct 12, 2015

i18nize-templates: Internationalization After the Fact

Craig Silverstein on Sep 28, 2015

Making thumbnails fast

William Chargin on Sep 14, 2015

Copy-pasting more than just text

Sam Lau on Aug 31, 2015

No cheating allowed!!

Phillip Lemons on Aug 17, 2015

Fun with slope fields, css and react

Marcos Ojeda on Aug 5, 2015

Khan Academy: a new employee's primer

Riley Shaw on Jul 20, 2015

How wooden puzzles can destroy dev teams

John Sullivan on Jul 6, 2015

Babel in Khan Academy's i18n Toolchain

Kevin Barabash on Jun 22, 2015

tota11y - an accessibility visualization toolkit

Jordan Scales on Jun 8, 2015


Untangling our Python Code

by Carter J. Bastian on Apr 16, 2018

The previous posts about The Great Khan Academy Python Refactor of 2017 and Also 2018 answered two questions: why and how did we refactor all of our Python code? In this post, I want to look closer at a major goal of this project: cleaning up dependencies between parts of our Python codebase.

Going into the project, we decided that we would build a dependency order into the structure of our Python code. Below, we’ll look at why dependency order matters, what our solution looked like, and how we went about implementing it.

Why is dependency order important?

Take a look at a visual representation of Khan Academy’s code base:

A picture of a huge dependency graph; little structure is visible amidst the tangle

Each dot in this image represents a module in our Python codebase. Each line represents an import of one module by another. tl;dr our Python code was a tangle!

Our dependency tangle looped back on itself in complicated and frustrating ways that posed a series of problems:

  • Code didn’t live where it "should", making it hard both for new developers to get oriented in the codebase and for experienced developers to find the particular functionality they’re looking for.
  • Given an arbitrary change, we didn’t know what to test because we weren’t sure what code might indirectly use the code we changed.
  • Intertwined dependencies made it difficult to draw boundaries between logically distinct parts of the codebase. This in turn made it harder to put stronger service-boundaries on our codebase.

In short, we had a lot of problems stemming from poor code organization.

In addition to these retroactive changes, our less-tangled codebase gives guidance to developers writing new code. Once we put in the work of defining what should and should not be depended on, we know to question any feature design that imports code from higher in the hierarchy. Since this dependency order is built directly into our code structure, we can automatically detect places in the codebase that are either poorly designed or poorly organized.

What does dependency order even mean?

In general, a dependency order is a set of rules deciding which files can be imported by a piece of code in a codebase. Some languages and frameworks implement dependency order by placing restrictions on possibly-problematic dependency practices. Since Python doesn't provide any such rules, we decided to write some of our own.

We decided to create these rules by placing each Python file in our codebase into some conceptual bucket, which we call a "package", and then defining an ordering on these buckets. The ordering rule is that a file in bucket X can import a file in bucket Y if and only if Y is lower than X in the ordering.

In other words, we divided all of our code into packages and placed these packages in a (mostly) linear order, where any given package is allowed to depend on packages lower than itself.

How we decided on the dependency ordering

It was not easy to decide how to order the packages. In theory any directed acyclic graph would be a legal ordering. But we decided to limit ourselves to a linear ordering, with one exception: we sometimes had several packages at the same "level". These packages could not depend on one another, but any package above them could depend on any (or all!) of them.

We did this both to limit the universe of possible orderings, and to make it easier for humans to understand the final result. A linear ordering is easy to understand and explain to others.

However, this did open up an issue that we sometimes had to order two totally unrelated packages, and arbitrarily decide which one was higher and which was lower. Sometimes we would decide by calculating the number of bad dependencies both ways, and seeing which led to fewer bad dependencies!

How we untangled our codebase

In order to define and enforce our new dependency ordering, we decided to have our 40 packages be top level directories. Any time a file in a lower-level package depends on a file in a higher-level package, we can classify that as a "bad dependency".

By defining what can (and more importantly, what can't) be depended on by any given code feature, we can automatically detect places in our codebase that are architecturally problematic.

Once we had a set of packages and their ordering figured out, the first step towards implementation was to move each file in our codebase into the appropriate package, as described in the earlier blog posts. This allowed us to reason about our codebase's dependency structure.

For example, consider the following excerpt from our package list and their descriptions:

lib/ - Miscellaneous libraries. Utilities with some Khan-specific logic that are still general enough to be used across many components.

flags/ - Infrastructure implementing A/B testing and feature flagging.

coaches/ - Contains core models and mechanics for coach/student (including parent/child and teacher/pupil). Also includes code related to classrooms and schools.

These three packages above are listed in dependency order. In other words, it’s fine for the code in flags/ to depend on the more-general code in lib/. Similarly, the higher-level code in coaches/ implements the coach-student relationship, and should be able to use the lower-level infrastructure in flags/ to set up coach/student-related experiments.

Conversely, if a file in the flags/ directory were to import a file from the coaches/ directory, that would violate our dependency order and we would classify that as a "bad dependency". Conceptually, this feels right; for example, it doesn’t seem like good design for the infrastructure implementing experiments to depend on specific, product-level experiments.

The second step in our process was to find the places where our code wasn’t following the new dependency rules. We used a script for this which output an HTML report on our dependencies.

Specifically, this was big list of all the dependencies in our codebase broken down by package. For example, here’s an excerpt from one of our package reports listing the bad dependencies (in red) and good dependencies (in green) for one file in the flags/ packages:


  • flags/ depends on coaches/, coaches/, coaches/, sat/, translations/, flags/bigbingo/, flags/, flags/feature_flags/, flags/gandalf/, intl/, web/request/, web/request/


Diagnostic Report

Of 6787 total dependencies, there are 862 bad dependencies (12.700751%).

As you can see, the example given above was an actual issue in our codebase; the experiment infrastructure was depending on the product logic underlying specific experiments in the coaches/ package.

The third and final step was to fix hubs of bad dependencies. This process changed on a case-by-case basis — the cause underlying bad dependencies could be anything from poorly designed code to poorly defined packages.

More often than not, a file with lots of bad dependencies was indicative of bad code organization — either the file was in the wrong place, or the file was doing too many things. In these cases, we were able to use that code smell to refactor files relatively easily and improve our code organization.

For example, consider the following excerpt from flags/, which has both a function used for logging opt-in experiments, and also a function used to determine if a user is in a classroom so that we can consider showing them coach-related experiments:

from __future__ import absolute_import

import coaches.students  # BAD DEPENDENCY
import coaches.teachers  # BAD DEPENDENCY

def log_opt_in_experiment_event(user_id, experiment_name,
    # ... add analytics for an event

def is_classroom_user(user_data):
    """Determine if user is shown classroom A/B tests"""
    if coaches.teachers.classified_as_teacher(user_data):
        return CLASSROOM_USER

    if (user_data.classroom_user_status != 
        return user_data.classroom_user_status

        user_data, caller='is_classroom_user')

    return user_data.NOT_CLASSROOM_USER

In this experiment, the bad dependencies are coming from the second function, and they indicate a valid organizational problem: is_classroom_user boils down to an implementation detail for coaching-specific experiments. Since, implementation of specific experiments is outside the scope of experiment infrastructure, flags/ is trying to do something outside the scope of its package.

Thankfully, this is a pretty easy fix: we can just use slicker to move the offending function to wherever the coach-related experiments actually live (in this case, coaches/

After moving this function, flags/ won’t depend on coaches.teachers or coaches.students, and our updated package report will look like this:


  • flags/ depends on coaches/, sat/, translations/, flags/bigbingo/, flags/, flags/feature_flags/, flags/gandalf/, intl/, web/request/, web/request/


Diagnostic Report

Of 6787 total dependencies, there are 860 bad dependencies (12.67128%).

And just like that, our codebase is two bad dependencies cleaner. Of course, the fixes are rarely so small or straight-forward. We’d often find tangles of poor code structure that would decrease bad dependencies by quite a bit. The process was roughly the same though and can continue ad infinitum; given more time to work on the project, we’d simply iterate on steps two and three (looking for bad dependencies and then breaking them up) making the codebase incrementally cleaner.

The results

On one hand, we weren’t able to get rid of all of our bad dependencies – we still have a whitelist of about 750 imports that violate our new rules.

On the other hand, getting to a codebase that exactly matches our self-imposed dependency structure doesn’t prevent us from getting much of the benefit that such an architecture provides. Even with the many exceptions to our new dependency restrictions, it is much easier to reason about where functionality lives and how to test a given piece of code.

To start, one big win is that we were able to record a measurable improvement in our codebase’s structure! Having 1) built out a formalized notion of what a bad dependency is and 2) written tools for finding them in our codebase, we've been able to measure the "goodness" of our dependency structure over the course of the project.

When we first took this measurement, we found that about 15.2% of all of our inter-package dependencies were bad. Taking this measurement the codebase in production right after the end of the project, we had reduced the percent of bad dependencies to 7.8%.

7.8% may still seem like a lot, but you can visually see the difference. Below is the dependency graph generated after our changes. Compared to the first tangle, we see that a lot of the dependencies that had been scattered all over the place have now been concentrated into a few clusters around our inter-package APIs, making the tangle appear... lighter.

A picture of a huge dependency graph; there is still a tangle at the center but there are some other, more distinct groupings

What’s more, the benefits of this continue to help us develop clean code quickly. We now have a check that’s run at commit time that will let you know if your changes added a new bad import. In other words, you can’t accidentally introduce dependencies on higher-level code; if you’re going to violate the new rules, you have to do so explicitly.

These effects, in tandem with the clean-ups and refactors that came along with our dependency-chasing process above, left our codebase in a good place. Our self-imposed dependency order will help us develop faster and better in 2018 and beyond.

Big thanks to all of my teammates who helped get this blogpost from its draft to publish, especially Kevin Dangoor, Scott Grant, Ben Kraft, and Craig Silverstein.