The Stream Blog

Why we switched from Python to Go

Switching to a new language is always a big step, especially when only one of your team members has prior experience with that language. Early this year, we switched Stream’s primary programming language from Python to Go. This post will explain some of the reasons why we decided to leave Python behind and make the switch to Go.

Thanks to Ren Sakamoto for translating Why we switched from Python to Go into Japanese, なぜ私達は Python から Go に移行したのか.

Reasons to Use Go

Reason 1 – Performance

Go is fast!
Go is fast!

Go is extremely fast. The performance is similar to that of Java or C++. For our use case, Go is typically 30 times faster than Python. Here’s a small benchmark game comparing Go vs Java.

Reason 2 – Language Performance Matters

For many applications, the programming language is simply the glue between the app and the database. The performance of the language itself usually doesn’t matter much.

Stream, however, is an API provider powering the feed infrastructure for 500 companies and more than 200 million end users. We’ve been optimizing Cassandra, PostgreSQL, Redis, etc. for years, but eventually, you reach the limits of the language you’re using.

Python is a great language but its performance is pretty sluggish for use cases such as serialization/deserialization, ranking and aggregation. We frequently ran into performance issues where Cassandra would take 1ms to retrieve the data and Python would spend the next 10ms turning it into objects.

Reason 3 – Developer Productivity & Not Getting Too Creative

Have a look at this little snippet of Go code from the How I Start Go tutorial. (This is a great tutorial and a good starting point to pick up a bit of Go.)

package main

type openWeatherMap struct{}

func (w openWeatherMap) temperature(city string) (float64, error) {
	resp, err := http.Get("http://api.openweathermap.org/data/2.5/weather?APPID=YOUR_API_KEY&q=" + city)
	if err != nil {
		return 0, err
	}

	defer resp.Body.Close()

	var d struct {
		Main struct {
			Kelvin float64 `json:"temp"`
		} `json:"main"`
	}

	if err := json.NewDecoder(resp.Body).Decode(&d); err != nil {
		return 0, err
	}

	log.Printf("openWeatherMap: %s: %.2f", city, d.Main.Kelvin)
	return d.Main.Kelvin, nil
}

If you’re new to Go, there’s not much that will surprise you when reading that little code snippet. It showcases multiple assignments, data structures, pointers, formatting and a built-in HTTP library.

When I first started programming I always loved using Python’s more advanced features. Python allows you to get pretty creative with the code you’re writing. For instance, you can:

  • Use MetaClasses to self-register classes upon code initialization
  • Swap out True and False
  • Add functions to the list of built-in functions
  • Overload operators via magic methods
  • Use functions as properties via the @property decorator

These features are fun to play around with but, as most programmers will agree, they often make the code harder to understand when reading someone else’s work.

Go forces you to stick to the basics. This makes it very easy to read anyone’s code and immediately understand what’s going on.

Note: How “easy” it is really depends on your use case, of course. If you want to create a basic CRUD API I’d still recommend Django + DRF, or Rails.

Reason 4 – Concurrency & Channels

As a language, Go tries to keep things simple. It doesn’t introduce many new concepts. The focus is on creating a simple language that is incredibly fast and easy to work with. The only area where it does get innovative is goroutines and channels. (To be 100% correct the concept of CSP started in 1977, so this innovation is more of a new approach to an old idea.) Goroutines are Go’s lightweight approach to threading, and channels are the preferred way to communicate between goroutines.

Goroutines are very cheap to create and only take a few KBs of additional memory. Because Goroutines are so light, it is possible to have hundreds or even thousands of them running at the same time.

You can communicate between goroutines using channels. The Go runtime handles all the complexity. The goroutines and channel-based approach to concurrency makes it very easy to use all available CPU cores and handle concurrent IO – all without complicating development. Compared to Python/Java, running a function on a goroutine requires minimal boilerplate code. You simply prepend the function call with the keyword “go”:

package main

import (
	"fmt"
	"time"
)

func say(s string) {
	for i := 0; i < 5; i++ {
		time.Sleep(100 * time.Millisecond)
		fmt.Println(s)
	}

}

func main() {
	go say("world")
	say("hello")
}

https://tour.golang.org/concurrency/1

Go’s approach to concurrency is very easy to work with. It’s an interesting approach compared to Node where the developer has to pay close attention to how asynchronous code is handled.

Another great aspect of concurrency in Go is the race detector. This makes it easy to figure out if there are any race conditions within your asynchronous code.

Here are a few good resources to get started with Go and channels:

Reason 5 – Fast Compile Time

Our largest micro service written in Go currently takes 6 seconds to compile. Go’s fast compile times are a major productivity win compared to languages like Java and C++ which are famous for sluggish compilation speed. I like sword fighting, but it’s even nicer to get things done while I still remember what the code is supposed to do:

XKCD - Code compiling before Go
XKCD – Code compiling before Go

Reason 6 – The Ability to Build a Team

First of all, let’s start with the obvious: there are not as many Go developers compared to older languages like C++ and Java. According to StackOverflow, 38% of developers know Java, 19.3% know C++ and only 4.6% know Go. GitHub data shows a similar trend: Go is more widely used than languages such as Erlang, Scala and Elixir, but less popular than Java and C++.

Fortunately, Go is a very simple and easy to learn language. It provides the basic features you need and nothing else. The new concepts it introduces are the “defer” statement and built-in management of concurrency with “go routines” and channels. (For the purists: Go isn’t the first language to implement these concepts, just the first to make them popular.) Any Python, Elixir, C++, Scala or Java dev that joins a team can be effective at Go within a month because of its simplicity.

We’ve found it easier to build a team of Go developers compared to many other languages. If you’re hiring people in competitive ecosystems like Boulder and Amsterdam this is an important benefit.

Reason 7 – Strong Ecosystem

For a team of our size (~20 people) the ecosystem matters. You simply can’t create value for your customers if you have to reinvent every little piece of functionality. Go has great support for the tools we use. Solid libraries were already available for Redis, RabbitMQ, PostgreSQL, Template parsing, Task scheduling, Expression parsing and RocksDB.

Go’s ecosystem is a major win compared to other newer languages like Rust or Elixir. It’s of course not as good as languages like Java, Python or Node, but it’s solid and for many basic needs you’ll find high-quality packages already available.

Reason 8 – Gofmt, Enforced Code Formatting

Let’s start with what is Gofmt? And no, it’s not a swear word. Gofmt is an awesome command line utility, built into the Go compiler for formatting your code. In terms of functionality it’s very similar to Python’s autopep8. While the show Silicon Valley portrays otherwise, most of us don’t really like to argue about tabs vs spaces. It’s important that formatting is consistent, but the actual formatting standard doesn’t really matter all that much. Gofmt avoids all of this discussion by having one official way to format your code.

Reason 9 – gRPC and Protocol Buffers

Go has first-class support for protocol buffers and gRPC. These two tools work very well together for building microservices which need to communicate via RPC. You only need to write a manifest where you define the RPC calls that can be made and what arguments they take. Both server and client code are then automatically generated from this manifest. This resulting code is both fast, has a very small network footprint and is easy to use.

From the same manifest, you can generate client code for many different languages even, such as C++, Java, Python and Ruby. So, no more ambiguous REST endpoints for internal traffic, that you have to write almost the same client and server code for every time. .

Disadvantages of Using Golang

Disadvantage 1 – Lack of Frameworks

Go doesn’t have a single dominant framework like Rails for Ruby, Django for Python or Laravel for PHP. This is a topic of heated debate within the Go community, as many people advocate that you shouldn’t use a framework to begin with. I totally agree that this is true for some use cases. However, if someone wants to build a simple CRUD API they will have a much easier time with Django/DJRF, Rails Laravel or Phoenix.

Update: as the comments pointed out there are several projects that provide a framework for Go. Revel, Iris, Echo, Macaron and Buffalo seem to be the leading contenders. For Stream’s use case we prefer to not use a framework. However for many new projects that are looking to provide a simple CRUD API the lack of a dominant framework will be a serious disadvantage.

Disadvantage 2 – Error Handling

Go handles errors by simply returning an error from a function and expecting your calling code to handle the error (or to return it up the calling stack). While this approach works, it’s easy to lose scope of what went wrong to ensure you can provide a meaningful error to your users. The errors package solves this problem by allowing you to add context and a stack trace to your errors.

Another issue is that it’s easy to forget to handle an error by accident. Static analysis tools like errcheck and megacheck are handy to avoid making these mistakes.

While these workarounds work well it doesn’t feel quite right. You’d expect proper error handling to be supported by the language.

Disadvantage 3 – Package Management

Go’s package management is by no means perfect. By default, it doesn’t have a way to specify a specific version of a dependency and there’s no way to create reproducible builds. Python, Node and Ruby all have better systems for package management. However, with the right tools, Go’s package management works quite well.

You can use Dep to manage your dependencies to allow specifying and pinning versions. Apart from that, we’ve contributed an open-source tool called VirtualGo which makes it easier to work on multiple projects written in Go.

Virtual Go
Virtual Go

 

Python vs Go

One interesting experiment we conducted was taking our ranked feed functionality in Python and rewriting it in Go. Have a look at this example of a ranking method:


{
	"functions": {
		"simple_gauss": {
			"base": "decay_gauss",
			"scale": "5d",
			"offset": "1d",
			"decay": "0.3"
		},
		"popularity_gauss": {
			"base": "decay_gauss",
			"scale": "100",
			"offset": "5",
			"decay": "0.5"
		}
	},
	"defaults": {
		"popularity": 1
	},
	"score": "simple_gauss(time)*popularity"
}

Both the Python and Go code need to do the following to support this ranking method:

  1. Parse the expression for the score. In this case, we want to turn this string “simple_gauss(time)*popularity” into a function that takes an activity as input and returns a score as output.
  2. Create partial functions based on the JSON config. For example, we want “simple_gauss” to call “decay_gauss” with a scale of 5 days, offset of 1 day and a decay factor of 0.3.
  3. Parse the “defaults” configuration so you have a fallback if a certain field is not defined on an activity.
  4. Use the function from step 1 to score all activities in the feed.

Developing the Python version of the ranking code took roughly 3 days. That includes writing the code, unit tests and documentation. Next, we’ve spent approximately 2 weeks optimizing the code. One of the optimizations was translating the score expression (simple_gauss(time)*popularity) into an abstract syntax tree. We also implemented caching logic which pre-computed the score for certain times in the future.

In contrast, developing the Go version of this code took roughly 4 days. The performance didn’t require any further optimization. So while the initial bit of development was faster in Python, the Go based version ultimately required substantially less work from our team. As an added benefit, the Go code performed roughly 40 times faster than our highly-optimized Python code.

Now, this is just a single example of the performance gains we’ve experienced by switching to Go. It is, of course, comparing apples to oranges:

  • The ranking code was my first project in Go
  • The Go code was built after the Python code, so the use case was better understood
  • The Go library for expression parsing was of exceptional quality

Your mileage will vary. Some other components of our system took substantially more time to build in Go compared to Python. As a general trend, we see that developing Go code takes slightly more effort. However, we spend much less time optimizing the code for performance.

Elixir vs Go – The Runner Up

Another language we evaluated is Elixir. Elixir is built on top of the Erlang virtual machine. It’s a fascinating language and we considered it since one of our team members has a ton of experience with Erlang.

For our use cases, we noticed that Go’s raw performance is much better. Both Go and Elixir will do a great job serving thousands of concurrent requests. However, if you look at individual request performance, Go is substantially faster for our use case. Another reason why we chose Go over Elixir was the ecosystem. For the components we required, Go had more mature libraries whereas, in many cases, the Elixir libraries weren’t ready for production usage. It’s also harder to train/find developers to work with Elixir.

These reasons tipped the balance in favor of Go. The Phoenix framework for Elixir looks awesome though and is definitely worth a look.

Conclusion

Go is a very performant language with great support for concurrency. It is almost as fast as languages like C++ and Java. While it does take a bit more time to build things using Go compared to Python or Ruby, you’ll save a ton of time spent on optimizing the code.

We have a small development team at Stream powering the feeds for over 200 million end users. Go’s combination of a great ecosystem, easy onboarding for new developers, fast performance, solid support for concurrency and a productive programming environment make it a great choice.

Stream still leverages Python for our dashboard, site and machine learning for personalized feeds. We won’t be saying goodbye to Python anytime soon, but going forward all performance-intensive code will be written in Go.

If you want to learn more about Go check out the blog posts listed below. To learn more about Stream, this interactive tutorial is a great starting point.

More Reading about Switching to Golang

Learning Go