
As many of us are preparing for the PyCon conference, we wanted to talk a little about how Python is used in Netflix. We use Python throughout the entire life cycle: from deciding which series to fund, and ending with the work of the CDN to upload video to 148 million users. We contribute to many open source Python packages, some of which are mentioned below. If something interests you, look at our
job site or look for us on PyCon.
Open connect
Open Connect is a Netflix content delivery network (CDN). A simple, albeit inaccurate, way to present the Netflix infrastructure, is this: everything that happens before pressing the Play button on the remote control (for example, logging in, defining a pricing plan, recommendation system, selecting movies) works in Amazon Web Services (AWS) , and everything that happens next (i.e. streaming video) works through Open Connect. The content is placed on the network of Open Connect CDN servers as close as possible to the end user in order to improve the quality of viewing for clients and reduce the costs of Netflix and our partners, Internet service providers.
Various software systems are needed to design, develop, and operate a CDN, and many of them are written in Python. Network devices at the core of a CDN are mainly controlled by Python applications. Such applications monitor the status of network equipment: which devices are used, which models, with which hardware components, where they are located. The configuration of these devices is controlled by several other systems, including the “source of truth”, applications for configuring devices and backup. The interplay of health data and other operational data collection devices is another Pythonic application. Python has long been a popular programming language on the web, because it is an intuitive language that allows you to quickly solve network problems. Developed many useful libraries that make the language even more suitable for study and use.
')
Demand engineering
Demand Engineering is responsible for
working out regional outages , traffic distribution, bandwidth operations, and server efficiency in the Netflix cloud. We can proudly say that our tools are built primarily on Python. The failover service uses numpy and scipy for numerical analysis, boto3 to make changes to the AWS infrastructure, rq to run asynchronous workloads, and all this is wrapped in a thin layer of Flask API. The ability to leave with the refusal of the
bpython shell and improvise more than once saved the day.
We actively use Jupyter Notebooks and
nteract to analyze operational data and prototype
visualization tools to detect capacity regressions.
CORE
The CORE team uses Python to analyze statistics and issue warnings. We rely on many statistical and mathematical libraries (numpy, scipy, ruptures, pandas) to automate the analysis of 1000 related signals when alert systems indicate problems. We have developed a time series correlation system, used both inside and outside the team, as well as a distributed work system for parallelizing a large amount of analytical work to get quick results.
We also usually use Python for automation tasks, research and filtering of data and as a convenient tool for visualization.
Monitoring, Alert and Auto Repair
The Insight Engineering team is responsible for developing and operating online problem recognition, alert, diagnostics, and automatic fix tools. With the growing popularity of Python, the team now supports Python clients for most of their services. One example is the
Spectator client library for code that records dimensional time series metrics. We create Python libraries to interact with other services on the Netflix platform. In addition to the libraries,
Winston and
Bolt products are built using Python frameworks (Gunicorn + Flask + Flask-RESTPlus).
Information Security
The information security team uses Python for a number of important tasks, including automating security, classifying risks, identifying and automatically fixing vulnerabilities. We published the source of a number of successful products, including
Security Monkey (our most active open source project). Python is used to protect our SSH resources using
Bless . The infrastructure security team uses Python to configure IAM permissions using
Repokid . Python scripts help generate TLS certificates in
Lemur .
Some of our recent projects include Prism: a package framework that helps security engineers analyze the state of the infrastructure, identify risk factors and vulnerabilities in the source code. We currently provide Python and Ruby libraries for Prism. Diffy is a tool for forensic (computer forensics)
written entirely in Python. We also use Python to detect sensitive data using Lanius.
Personalization Algorithms
We widely use Python in
machine learning infrastructure for personalization . Some models are taught here that provide key aspects of the Netflix functionality: from
recommendation algorithms to
cover selection and
marketing algorithms . For example, some algorithms use TensorFlow, Keras and PyTorch to train deep neural networks, XGBoost and LightGBM to train decision trees with gradient boosting or a wider Python science stack (for example, numpy, scipy, sklearn libraries, matplotlib, pandas, cvxpy libraries). Since we are constantly trying new approaches, we use Jupyter notebooks for many experiments. We also developed a number of higher-level libraries for integrating notebooks with the rest of our
ecosystem (for example, accessing data, registering facts and retrieving features, evaluating models, and publishing).
Machine learning infrastructure
In addition to personalization, Netflix applies machine learning in hundreds of other tasks throughout the company. Many of these applications run on Metaflow, the Python platform, which makes it easy to run ML projects and lead them from the prototype stage to production.
Metaflow pushes the boundaries of Python: we use well-parallelized and optimized Python code to extract data at a speed of 10 Gbit / s, process hundreds of millions of data points in memory and organize computations on tens of thousands of CPU cores.
Notebooks
We at Netflix are avid users of Jupyter notebooks, and we have already written about the
causes and nature of these investments .
But Python plays a huge role in how these services are provided. It is the main language for developing, debugging, researching and prototyping various interactions with the Jupyter ecosystem. We use Python to create custom extensions for the Jupyter server, which allows you to manage tasks such as logging, archiving, publishing and cloning notebooks on behalf of users. We provide our users with many options in Python through various Jupyter kernels and manage the deployment of these kernel specifications, too, using Python.
Orchestration
The Big Data Orchestration team is responsible for providing all the services and tools for planning and executing the ETL and Adhoc pipelines.
Many orchestration components are written in Python. Starting with a scheduler that uses
Jupyter and
papermill notebooks for template job types (Spark, Presto ...). This gives users a standardized and easy way to express the work that needs to be done.
Here you can read more about it. We used notepads as real lists of operations in production (“runbooks”) in situations where human intervention is required, for example, to restart everything that fell in the last hour.
For internal use, we built an event-driven platform that is written entirely in Python. It receives streams of events from a number of systems that are combined into a single tool. This allows you to define the conditions for filtering events, responding to them or routing. As a result, we were able to isolate microservices and ensure transparency in everything that happens on the data platform.
Our team has also developed the
pygenie client, which interacts with the federated
Genie task service. Inside, we have additional extensions to this library, applying business agreements and integrated with the Netflix platform. These libraries are the primary way for users to programmatically interact with the big data platform.
Finally, our team contributed to the open source
papermill and
scrapbook projects: we added code for both our own and external use cases. Our efforts are well received in the open source community, which we are very pleased about.
Experimental platform
The scientific computing team creates a platform for experiments: AB-tests and others. Scientists and engineers can experiment with innovations in three areas: data, statistics, and visualization.
Our metrics repository is a
PyPika- based Python platform that allows you to write reusable parameterized SQL queries. This is the entry point for any new analysis.
Causal Models' causal model library is based on Python and R: here scientists have the opportunity to explore new causal effects models. It uses PyArrow and RPy2, so statistics is easily calculated in any language.
A library of visualizations based on
Plotly . Since Plotly is a common specification for visualizations, there are many output tools that go to our platforms.
Partner ecosystem
The partner ecosystem group uses Python to test Netflix applications on devices. Python forms the core of the continuous integration infrastructure, including managing our orchestration servers, managing Spinnaker, querying test cases and filtering, and scheduling test runs on devices and containers. Additional post-launch analysis is performed in Python using TensorFlow to determine which tests are most likely to cause problems on which devices.
Video encoding and development of the media cloud
Our team takes care of the coding (and transcoding) of the Netflix directory, and also uses machine learning to analyze this directory.
We use Python in about 50 projects, such as
vmaf and
mezzfs , create computer vision solutions using the map-reduce platform called
Archer , and use Python for many internal projects.
We also opened several tools to facilitate the development / distribution of Python projects such as
setupmeta and
pickley .
Netflix and NVFX Animation
Python is the industry standard for all major applications that we use to create animated and VFX content, so it goes without saying that we use it intensively. All of our integrations with Maya and Nuke are done in Python, and the bulk of our Shotgun tools too. We just started building tools in the cloud and are going to deploy many custom Python AMI / containers there.
Machine learning in content, science and analytics
The machine learning team in content makes extensive use of Python to develop machine learning models that are the core of predicting audience size, views, and other metrics for all content.