Building a Better GitHub Platform with GraphQL

Brandon Black

GitHub is one of the largest and most influential tech companies in existence. Today, it's home to more than 38 million projects and 15 million users. It's a hub of communication and collaboration, and the go-to location for open-source software.

As large as GitHub has become, the core application still remains a single, giant, monolithic entity. This has been a concern for the company's forward-thinking developers, who want to build new features, make the platform more transparent and make it easier for the service to be used at scale. Although GitHub mainly uses Ruby, many of it's engineers are interested in exploring new types of languages and technologies. Internally, GitHub's dev teams would also like to make interaction and work between teams a simpler and more seamless process.

Enter GraphQL. Since March 2016, GitHub developers have been working internally on a GraphQL interface. By September, the team decided that it was valuable enough to release for use by external developers as well.

The Problem with REST

One major pain point that GitHub developers experienced with REST APIs is that retrieving the data you want often requires multiple calls. In GitHub's case, this would create inefficiencies by fetching the entire repository multiple times, when often you need only a single data point such as the repository's name or status.

An additional problem with REST APIs was that the core product and the API are often out of sync. Sometimes, the API lags behind the product updates, and a new product feature that external developers want to work with is unavailable in the API.

GraphQL has solved both of these issues for GitHub. First, GraphQL allows you to request only the data that you need at any given point in your application. Second, features are always available in the API because they're first built internally and only then exposed externally for others to use.

Why GraphQL?

Although it was possible to build their own solution, GitHub chose to use GraphQL for several reasons. First, GraphQL had a clear value proposition for the features that it provided, including built-in documentation, static typing, and its unopinionated stance. It gave GitHub developers a standard on which to build, while also remaining agnostic in terms of implementation.

As an organization that has been so strongly linked with open-source software, GitHub also appreciated GraphQL's own lively, productive, open-source community. Because GitHub uses its own customized Rails stack, it hasn't had as many opportunities to give back to the community with its internal frameworks. When it found GraphQL, GitHub saw an early opportunity to contribute to a valuable piece of open-source technology. This has allowed GitHub developers to identify and solve a lot of big, exciting problems that other people haven't seen yet.

Finally, GraphQL aligns well with many of the objectives that developers have for the GitHub platform: things like using a service-oriented architecture and diversifying the company's tools and technologies. In addition, GraphQL has changed the way that GitHub ships its product by making the team think about how changes to their API affect both internal and external developers.

Open-Source Projects

In the course of using GraphQL, GitHub has worked on several open-source projects that help developers migrate to and work with the technology. The team has written a GraphQL client in Ruby that is used internally to power many of the GitHub core features. The client can provide query validation in Rails views to make sure that users aren't fetching too much data, and handles query parsing and execution for the developer.

To address some of the developers' concerns about accidentally leaking information through a connection, the GitHub security team built another open-source project, a GraphQL walker that traverses the schema looking for node access discrepancies in the interface. Starting at a root node, the program searches for IDs that were retrieved via a connection that shouldn't be directly accessible using a direct node lookup.

GraphQL Challenges

Although GitHub has been very pleased with its adoption of GraphQL as a whole, the choice has not come without certain issues that the GitHub team has had to address and navigate:

Watch the video