Introduction to Microservices for Data Scientists

As a brief definition, Microservices is an architecture used in software development. Every data scientist needs to have minimum knowledge of DevOps concepts to be able to deploy their own app.

In this article, we are going to answer the following questions: What are microservices, why do we need them, how do they connect with each other, what is the monolithic architecture, and the difference between monorepo and polyrepo.

Before microservices

Before using microservices architectures, the standard was monolithic architecture.

What is monolith architecture?

In a monolithic architecture, all code components are part of a single unit (1 codebase) which means all the software needs to be written in the same language and used the same tech stack.

Here is how monolithic architecture works compared to microservices:

source:https://www.divante.com/blog/monolithic-architecture-vs-microservices

Consequently, all the code will be developed, deployed, and scaled as one unit. Any change or modification in one part of the code will affect the whole software and needs to redeploy the entire application.

Monolithic architecture challenges:

As the application gets large, it becomes difficult for teams to update their specific code as the components’ code gets tangled together and the need to deploy the whole app after updating a single service code.–> higher infrastructure cost.

The release process of a monolithic app takes longer because changing a single service requires testing the entire application. Also, any bug in a part of the app will bring down the whole app.

Why microservices

As a solution to the previous challenges of monolithic architecture, microservices came into the picture.

What are microservices

Microservices is an architectural style where the application is split into smaller independent services instead of one code.

Those multi-services are split based on business functionalities rather than technical functionalities. For example, in an e-commerce app, the shopping cart will be associated with a specific service and the checkout will be associated with another service.

Another characteristic of those services is that are loosely coupled. This means each service can be built and deployed separately.

How do microservices communicate?

To exchange data, microservices can use different communication methods:

  1. API calls: each service has its own API that sends HTTP requests to other services’ API endpoints.
  2. Message broker: Services may use an intermediate(message broker) to send messages to each other. The service sends a message to the message broker, which forwards it to the respective service.
  3. Service mesh: All inter-service communication is routed through proxies when a service mesh is employed, which may be used to provide networking capabilities such as encryption and load balancing.

Microservices challenges:

  • Being a distributed system, microservices added more complexity to the app.
  • The communication may not happen properly because one service API is down.
  • Hard to monitor multiple services at once.

CD/CI for microservices

The CD/CI Pipeline is a series of automated procedures that take code from version control and build, compile, test, and deploy it to production. The pipeline divides the software development process into phases. Each step is comprised of distinct tasks that can be completed concurrently. When all tasks in a stage have been completed, the following step is activated. The primary purpose of CI/CD is to create a shorter release cycle. As a result, end customers can obtain a new version of the program that includes new features, bug fixes, and an improved application.

Developing a dependable continuous integration/continuous delivery (CI/CD) workflow for a microservices architecture might be difficult. Individual teams must be able to provide services fast and reliably without interfering with other teams or causing the application to become unstable.

[source]

Monorepo vs polyrepo

Microservices code can be handled in two ways:

  1. Monorepo: the code of many projects is managed in a single repository
  2. Polyrepo: the code is managed in multiple repositories.
polyrepo vs monorepo in microservices [source]

One tool to manage microservices is Kubernetes.

What is Kubernetes

Kubernetes is an excellent platform for complicated applications with numerous microservices.

Kubernetes (also known as k8s or “Kube”) is an open-source container orchestration platform that automates many of the manual tasks required in containerized application deployment, management, and scaling.

End notes:

Every data scientist needs to know the bare minimum when it comes to MlOps and DevOps. I hope this article was helpful. If you have any questions, leave them in the comment section below.

Resources:

https://www.youtube.com/watch?v=rv4LlmLmVWk

https://cloud.google.com/architecture/microservices-architecture-introduction