Coursera is one of the top companies in the field of educational technology, offering massive, open online courses by partnering with renowned institutions of higher learning. Every day, the Coursera engineering team faces complex technical and infrastructural challenges, requiring strong problem-solving skills and a keen awareness of the development landscape.

Like many other tech companies, Coursera has embraced GraphQL and is currently working to migrate nearly all of its client data access to GraphQL by the end of 2017. How did the Coursera team arrive at this decision, and what were the steps along the technical journey that led to this point?

Early Solutions: JSON

During Coursera's early days as a small startup, its technical foundations were kept very simple. The front-end team put together a JavaScript-based single-page application, while the back-end team provided the JSON that the application needed to recreate the Coursera catalog on the web page. Because the catalog was fairly small, the developers chose to serve it in its entirety in a single API call. This choice allowed the Coursera home page to display and to return search results very quickly.

As the team and the catalog grew, the shortcomings of this approach quickly became evident. For one, the number of courses grew rapidly, from roughly 10 to more than 1,500. This required downloading a megabyte of JSON to display only a dozen courses on the home page. More significantly, the product needs were changing over time.

The initial monolithic API couldn't sufficiently handle new features, such as premium course certificates, specializations and multi-university course collaborations. In some cases, these features violated the API's core assumptions: for example, that each course would have only one university as a partner. Finally, the growing size of the engineering team meant that developers, particularly on the back end, were continually stepping on each others' toes, creating merge conflicts as they tried to add new features.

From this initial experience, the Coursera team learned that scalability is a very important consideration, especially for small companies that grow extremely quickly. Making an application scalable must be accomplished in several different dimensions: the number of queries per second, the size of the data set, the size of the engineering team and the number of distinct types in the application's ontology.

Early Solutions: Scala

In search of a better type system, Coursera next chose to work with microservices using Scala and the Play framework. Developers found Scala's static typing system appealing, but Play's flexibility was actually counterproductive. The control that Play allowed over assets such as headers, body parsing and query parameters was more than the engineering team needed or wanted.

Meanwhile, Coursera realized that many of its users were accessing the website from outside the United States, using mobile phones with 2G or 3G connections. As such, developers had two main requirements for the API: It needed to minimize the amount of data transferred to benefit these bandwidth-constrained users, and it needed to minimize the number of round trips required to retrieve the data.

After formulating these two requirements, the Coursera team found itself faced with a choice of two mutually exclusive options. The first was to use "experience-based APIs," in which each "experience" on the website, from the homepage to the course catalog to the course description page, uses a different API that is perfectly matched to that experience. This idea fit the requirements, but developers were reluctant to use it for two reasons: It would couple the API implementation and the client views tightly together, and it would be a resource-intensive project that might be too much for a startup with limited engineering resources. The second option was to build a single, queryable API. Every "experience" can query the API in different ways to receive the data that it requires. As the Coursera team contemplated this second option, it realized that what it really wanted to use was a database.

Current Solutions: Naptime and GraphQL

In 2014, the Coursera team began to build Naptime, its own framework for writing REST APIs. Naptime gave developers the power to perform three of the most important operations in relational algebra to allow users to query the Coursera database: projection, selection and join. Soon after Naptime went live at Coursera, developers began to voluntarily switch over from using the stock Play framework APIs, rapidly adopting the new framework.

However, a number of problems still remained after implementing Naptime. For one, data about related resources was sent as a flattened list, which made it difficult for clients to work with. In addition, as APIs and products changed over time, the URL syntax became fairly cumbersome for developers to maintain, with extra fields sometimes remaining in the URL.

Because GraphQL queries are easy to read, modify and maintain, the Coursera team turned to GraphQL to supplement some of these weaknesses. Today, the Coursera architecture includes dozens of microservices, each handling a different subset of the platform, from profiles to enrollments to the course catalog. To navigate between these various microservices, developers have introduced a GraphQL assembler service with a unified schema that talks to each service and aggregates the data that it receives in response.

Throughout this technical evolution, the Coursera team has learned several valuable lessons: