Image Sorting and Composition Assessment Part 2 - Image Quality Assessment

Approaching photo quality assessment from Machine Learning point of view

Image Quality Assessment overview

In the previous post, I have examined the continuous learning aspects of the culling pipeline. But the most important logic of the pipeline is the Image Quality Assessment (IQA) portion of the pipeline, which quantifies the actual quality of the images, based on characteristics like noise level, closed eyes, lighting, and various other composition metrics.

There are many things to consider when we want to formulate IQA in the image culling settings. Do we want to be able to measure noise level, all the metrics seperately, or single unified quality score? Do we need to create ensemble models? How are we going to generate the test datasets? But before any of above complex ideas are addressed, it is important to see how IQA is done generally.

Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild

This is the paper that I came acorss at the CVPR 2023 conference.

Image Sorting and Composition Assessment Part 1 - Online Learning

Approaching photo quality assessment from Machine Learning point of view

Image Quality Assessment overview

Processing of 2D images are one of the most robust fields within Machine Learning due to high availability of the data. Image quality assessment (IQA) for 2D images is an important area within 2D image processing, and it has many practical applications, one being Photo culling. Photo Culling is a tedious, time-consuming process through which photographers select the best few images from a photo shoot for editing and finally delivering to clients. Photographers take thousands of photo in a single event, and they often manually screen photos one by one, spending hours of effort. Criteria that photographers look for is:

Closed eyes

osi
You always want subject to have eyes open.

Noise level

osi
Images can be noisy when there is motion in the subject or the camera itself.

Near duplicate images

osi
Photographers often take images in burst, and there will be many unwanted duplicates.

Proper light

osi
Selecting images with adequate lighting and good color composition

Rule of thirds

osi
Photographers often place the subject according to rule of third. Composition can be subjective and often varies according to the purpose of the photo.

Poses, emotion

osi
This is also subjective, but photographers would look for certain poses and emotion when selecting photos, and will select the ones that look most natural to them.

Machine learning and IQA

IQA is definitely one of the areas where Machine Learning algorithms can excel at. However, there are many criterias (defined above) that it must fulfill, and the biggest hurdle is that because every photographers have different preferences, you cannot expect one pretrained model to satisfy all the end users. We should implement Continual Learning (CL) process, so that we do not have to retrain entire pipeline of models from scratch. When the end user performs his own version of the culling process based on the suggestions of the pretrained model, this should become the new stream of data for CL training. The original IQA model should be tuned according to this new data, with the aim to minimize training time and the catastrophic forgetting, where the model forgets what it learned before the CL process began.

Now, we need to think about the architecture of this entire pipeline. First, it may be difficult to combine IQA and near-duplicate detevction (NDD) models together. Intuitively, a NDD model should break down the images into N set of images, and within each set, IQA will determine the quality of images. And there must be CL algorithm on top of everything. Ideally, NDD and IQA both should be continuously trainable, if not, at least the IQA portion of the pipeline should be trainable.

Another thing to consider is EXIF data. EXIF (Exchangeable Image File Format) files store important data about photographs. Almost all digital cameras create these data files each time you snap a new picture. An EXIF file holds all the information about the image itself — such as the exposure level, where you took the photo, and any settings you used. Apart from the photo itself, EXIF data could have important details for NDD and IQA. We must see if there is any improvement in the result when we incorporate the EXIF data. It could be totally unnecessary, just making network heavy. But it could prove to be useful as well.

Now, in terms of the representation of the architecture, something like below could make sense:

osi
Possible ideas for IQA and NDD portion of the architecture. Referenced from https://utorontomist.medium.com/photoml-photo-culling-with-machine-learning-908743e9d0cb.

NDD and IQA, I do already have some idea as to how they would work even before doing proper research, but the idea of continuous learning is very vague to me. How are these supposed to be implemented? Does it work on any architecture? What are the limitations?

I feel like it would only make sense for me to look into IQA and NDD closely after having better grasp of CL. Therefore, for the first part of the post, I will checkout Continous Learning.

Lifelong Learning (LL)

If you think about how humans learn, We always retain the knowledge learned in the past and use it to help future learning and problem solving. When faced with a new problem or a new environment, we can adapt our past knowledge to deal with the new situation and also learn from it. The author of Lifelong Machine Learning defines the process to imitate this human learning process and capability as Lifelong learning (LL), and states terms like Online learning and continual learning are subsets of LL.

Online Learning overview

What is Online Learning exactly? Online learning (OL), is a learning paradigm where the training data points arrive in a sequential order. This could be because the entire data is not available yet, due to various reasons like annotations being too expensive, or for cases like us, where we need the users to generate these data.

When a new data point arrives, the existing model is quickly updated to produce the best model so far. In online learning, if whenever a new data point arrives re-training using all the available data is performed, it will be too expensive. It becomes impossible when the model size is big. Furthermore, during re-training, the model being used could already out of date. Thus, online learning methods are typically memory and run-time efficient due to the latency requirement in a real-world scenario.

Much of the online learning research focuses on one domain/task. Objective is to learn more efficiently with the data arriving incrementally. LL, on the other hand, aims to learn from a sequence of different tasks, retain the knowledge learned so far, and use the knowledge to help future task learning. Online learning does not do any of these

Continual learning overview

The concept of OL is simple. The sequence of data gets unlocked in every stage, and the training is performed sequentially to make model more robust. Continual learning is also quite similar, but it is more sophisticated. It needs to learn various new tasks incrementally while sharing paramters with old ones, without forgetting.

Continuous learning is generally divided into three categories:

osi
In CL, there are three possible scenarios. It's very important to understand the distinction

Task Incremental learning

Let’s look at the first scenario. This scenario is best described as the case where an algorithm must incrementally learn a set of distinct tasks. Often times, this task identity is explicit, and it does not require seperate network. If the network requires completely different task (e.g Classification vs Object Detection), it would require seperate network and weights, And there will be no forgetting at all

The challenge with task-incremental learning, therefore, is not—or should not be—to simply prevent catastrophic forgetting, but rather to find effective ways to share learned representations across tasks, to optimize the trade-off between performance and computational complexity and to use information learned in one task to improve performance on other tasks

Now assume this scenario for MNIST:

osi
Each task here is to to binary classification of two digits, using the same network structure.

So the objective here would be to ensure that learning Task N would make learning of Task N+1 easier, and for above example, even after learning all the way up to Task 5, when we ask to perform classification for Task 1, the model can still perform well without forgetting.

Thus when we have some Input space X and output space Y, with the task context C, we learn to perform \[f : X \times C \rightarrow Y\]

Domain Incremental learning

In this scenario, the structure of the problem is always the same, but the context or input-distribution changes.

osi
Domain Incremental Learning is Task-Incremental learning without task information.

It is similar to Task incremental learning in that every training round n, we expect the model to learn from Dataset n: \[D_n = {(x_i, y_i)}\]

But at the test time, we do not know which “task” it came from. This is probably the most relevant cases for our pipeline. The available labels (classes) will remain the same, and domain will keep on changing. We do not care or want to provide task context. Thus intuitively, this is: \[f : X \rightarrow Y\]

Class Incremental learning

The final scenario is the class incremental learning. This is more popular field of study within continual learning, and most continual learning papers at CVPR were targeted for class incremental learning as well, because this is the area where often times the catastrophic forgetting (CF) happens most significantly. This scenario is best described as the case where an algorithm must incrementally learn to discriminate between a growing number of objects or classes.

Data arrives incrementally as a batch of per-class sets X i.e. (\(X\), \(X^2\), …, \(X^t\) ), where \(X^y\) contains all images from class y. Each round of learning from one batch of classes is considered Task \(T\). At each step \(T\), complete data is only available for new classes \(X\), and the only small number previously learned classes are available as memory buffer. If not using rehearsal technique, the memory buffer is not even available. The algorithm must map out global label space, and at testing time, it would not have any idea of which step the data came from.

osi
Expected input and output. In many practical applications, the data will arrive in a more mixed-up fashion.

And this is out of scope of this research. Why?

The new data that users will be adding over time will span across multiple classes, and a lot of them will be from previously seen classes. The strict protocol of class-incremental learning would not make sense.

Single head vs Multi-head evaluation

The system should be able to differentiate the tasks and achieve successful intertask classification without the prior knowledge of the task identifier (i.e. which task current data belongs to). In the case of single-head, the neural network output should consist of all the classes seen so far, so the output should be evaluated in the global scale. In contrast, multi-head evaluation only deals with intra-task classification where the network output only consists of a subset of all the classes.

For MNIST, say that the dataset is broken down by classes into 5 tasks, where we have [{0,1}....{8,9}]

In multi-head setting, we would know the ask number (let’s say Task 5), and we would only be making prediction b/w the subset {8,9}. Typically Task-incremental learning will fall under this.

In single-head setting, evaluation will be down across all ten classes, so {0…9}. Typically domain incremental and class incremental learning will fall under this.

Types of OL/CL algorithm

My conclusion after research is that although some authors make clear distinctions b/w Life long learning, Online learning and Continual Learning as most use them interchaneably. Regardless of being OL or CL, these properties are shared:

  1. Online model parameter adaptation: How to update the model parameters with the new data?
  2. Concept drift: How to deal with data when the data distribution changes over time?
  3. The stability-plasticity dilemma: If the model is too stable, it can’t learn from the new data. When the model is too plastic, it forgets knowledge learned from the old data. How to balance these two?
  4. Adaptive model complexity and meta-parameters: The presence of concept drift or introducing a new class of data might necessitate updating and increasing model complexity against limited system resources
osi
Catastrophic forgetting is the biggest chanllenge to solve.

So the main challenge, as everyone agrees, is to make model plastic enough to learn due data, but stable enough to not forget the previously gained knowledge. Finding this balance is the key to online learning. Now we look at some of the methods.

Replay methods

Replay methods are based on repeatedly exposing the model to the new data and data on which it has been already trained on. New data is combined with sampled old data, and can be used for rehearsal (thus retaining training data of old sub-tasks) or as constraint generators. If the retained samples are used for rehearsal, they are trained together with samples of the current sub-task. If not, they are used directly to characterize valid gradients.

The main differences b/w replay methods exists in the following:

  • How examples are picked and stored
  • How examples are replayed
  • How examples update the weights

Drawbacks

The storage cost would go up linearly with the tasks. Yes, we are sampling from older data, but as round progresses, the storage of older data would be required to increase.

Relevant papers

CVPR

Parameter Isolation methods

Parameter isolation methods try to assign certain neurons or network resources to a single sub-task. Dynamic architectures row, e.g., with the number of tasks, so for each new sub-task new neurons are added to the network, which do not interfere with previously learned representations. This is not relevant to us since we do not want to add additional tasks to the network.

Regularization methods

Regularization methods work by adding additional terms to the loss function. Parameter regularization methods study the effect of neural network weight changes on task losses and limit the movement of important ones, so that you do not deviate too much from what you learned previously.

Relevant papers

CVPR

Knowledge Distillation methods

Originally designed to transfer the learned knowledge of larger network (teacher) to a smaller one (student), knowledge distillation methods have been adapted to reduce activation and feature drift in continual learning. Unlike regularization methods that regularize the weights, KD methods regularize the network intermediate outputs.

Relevant papers

CVPR

Learning without Forgetting

Now that we have some general idea of how incremental learning works, let’s look at a paper in depth. Learning without Forgetting released in 2017 is probably one of the earliest and most famous work done in this field. This may not be perfectly suitable for our scenario where label space b/w previous task overlaps and may or may not contain novel class, but it is still important in the sense that you can understand general mechanism.

SequeL

This is based on this CVPR paper. SequeL is a flexible and extensible library for Continual Learning that supports both PyTorch and JAX frameworks. SequeL provides a unified interface for a wide range of Continual Learning algorithms, including regularization- based approaches, replay-based approaches, and hybrid approaches.

AWS Infrastructure part2 - ECS, and Terraform

Desinging infrastructure using AWS services and Terraform

Understanding AMI and ECS

It’s time to take things one step further, and explore the features of Terraform to learn about Infrastructure as a Code (IAC). But before we do so, let’s take a look at the components that will be the main building blocks of this tutorial: AMI and ECS.

Amazon Machine Images (AMI)

First of all, what is AMI?

If we want to launch virtual machine instances from a template, the AMI is the blueprint configuration for the instances, in fact a backup of entire EC2 instance. When one of your machine dies in the pipeline, AMI can be used to quickly spin up and fill up the gap. During the AMI-creation process, Amazon EC2 creates snapshots of your instance’s root volume and any other Amazon Elastic Block Store (Amazon EBS) volumes attached to the instance. This allows you to replicate both an instance’s configuration and the state of all of the EBS volumes that are attached to that instance.

ebs
After you create and register an AMI, you can use it to launch new instances. You can copy an AMI within the same AWS Region or to different AWS Regions. When you no longer require an AMI, you can deregister it.

Packer

All the dependencies and relevant packages that needs to be installed to AMI needs constant updates. You do not want to create the AMI manually. You would want to be able to look for updates, and if there is, automatically create the images using tools like Packer or EC2 Image builder. Packer is better alternative when using Terraform, as it’s also built by Hashicorp, and can be used across different cloud providers.

ECS? Why not EKS?

We have the AMI ready by Packer. What is next? We need ECS to use the images to spin up virtual machines, and run containers within these machines to efficiently handle the jobs in the backend and deliver it back to the clients.

Let’s check what ECS is conceptually. AWS Elastic Container Service (ECS) is Amazon’s homegrown container orchestration platform. Container orchestration, that sounds awfully familar doesn’t it? Yes, Kubernetes. AWS has over 200 services, and there are quite a few of them that are related to containers. Why not use Elastic Kubernetes Service or AWS Fargate?

ebs
compare EKS vs ECS side-by-side, and then discuss when to use ECS, EKS, or Fargate

EKS

As we explored in the previous tutorial, Kubernetes (K8s) is a powerful tool for container orchestration, and is the most prevalent options in the DevOps community. Amazon Elastic Kubernetes Service (EKS) is Amazon’s fully managed Kubernetes-as-a-Service platform that enables you to deploy, run, and manage Kubernetes applications, containers, and workloads in the AWS public cloud or on-premise.

ebs
When your app uses K8s, it makes most sense to use EKS.

Using EKS, you don’t have to install, run, or configure Kubernetes yourself. You just need to set up and link worker nodes to Amazon EKS endpoints. Everything is scaled automatically based on your configuration, deploying EC2 machines and Fargate containers based on your layout.

ECS

What about ECS? ECS is a fully managed container service that enables you to focus on improving your code and service delivery instead of building, scaling, and maintaining your own Kubernetes cluster management infrastructure. Kubernetes can be very handy, but difficult to coordinate as it requires additional layers of logic. This ultimately reduces management overhead and scaling complexities, while still offering some degree of benefits in terms of container orchesteration.

ECS is simpler to use, requires less expertise and management overhead, and can be more cost-effective. EKS can provide higher availability for your service, at the expense of additional cost and complexity. Thus my conclusion is–setting things like Kubernetes control plane feature aside–if you are not dealing with a service where availability is the most crucial aspect, and the cost matters more, ECS should be a preferred option.

ECS more in depth

Okay now it makes sense why we want to go for ECS for now to get things up to speed simply with Docker container images. Let’s dig deeper into various aspects of ECS. An important concepts to understand is Cluster, Task, and Service within ECS.

ebs
Cluster, Task, and service are the basic building blocks of ECS.

Cluster

ECS runs your containers on a cluster of Amazon EC2 (Elastic Compute Cloud) virtual machine instances pre-installed with Docker. In our cases, we might be asking to use ECS GPU optimized AMI in order to run deep-learning related applications, like the Nvidia Triton Server. An Amazon ECS cluster groups together tasks, and services, and allows for shared capacity and common configurations. These statistics can be integrated and monitored closely on AWS CloudWatch.

A Cluster can run many Services. If you have multiple applications as part of your product, you may wish to put several of them on one Cluster. This makes more efficient use of the resources available and minimizes setup time.

Task

This is the blueprint describing which Docker containers to run and represents your application. In our example, it would be two containers. would detail the images to use, the CPU and memory to allocate, environment variables, ports to expose, and how the containers interact.

ebs
Container should be based on some image registered in ECR.

In Terraform, this would look like:

resource "aws_ecs_task_definition" "service" {
  family                = "service"
  container_definitions = file("task-definitions/service.json")

  volume {
    name = "service-storage"

    docker_volume_configuration {
      scope         = "shared"
      autoprovision = true
      driver        = "local"

      driver_opts = {
        "type"   = "nfs"
        "device" = "${aws_efs_file_system.fs.dns_name}:/"
        "o"      = "addr=${aws_efs_file_system.fs.dns_name},rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
      }
    }
  }
}

Service

A Service is used to guarantee that you always have some number of Tasks running at all times. If a Task’s container exits due to an error, or the underlying EC2 instance fails and is replaced, the ECS Service will replace the failed Task. This is why we create Clusters so that the Service has plenty of resources in terms of CPU, Memory and Network ports to use

Here is sample of ECS service defined in Terraform for MongoDB.

resource "aws_ecs_service" "mongo" {
  name            = "mongodb"
  cluster         = "${aws_ecs_cluster.foo.id}"
  task_definition = "${aws_ecs_task_definition.mongo.arn}"
  desired_count   = 3
  iam_role        = "${aws_iam_role.foo.arn}"
  depends_on      = ["aws_iam_role_policy.foo"]

  ordered_placement_strategy {
    type  = "binpack"
    field = "cpu"
  }

  load_balancer {
    target_group_arn = "${aws_lb_target_group.foo.arn}"
    container_name   = "mongo"
    container_port   = 8080
  }

  placement_constraints {
    type       = "memberOf"
    expression = "attribute:ecs.availability-zone in [us-west-2a, us-west-2b]"
  }
}

Loadbalancer

Okay, Task is our application and service ensures our Tasks are running properly. When tasks comes in, how do we know which ECS instance, or which Task should the request be allocated in order to maximize the efficiency?

Read this: https://www.pulumi.com/docs/clouds/aws/guides/elb/

Auto Scaling Group

When the virtual machines are created, they are registered to the cluster. Each cluster will continuously monitor the usage, checking the overall usage. Amazon ECS can manage the scaling of Amazon EC2 instances that are registered to your cluster. This is referred to as Amazon ECS cluster auto scaling.

ebs
Container should be based on some image registered in ECR.

When you use an Auto Scaling group capacity provider with managed scaling turned on, you set a target percentage (the targetCapacity) for the utilization of the instances in this Auto Scaling group. Amazon ECS then manages the scale-in and scale-out actions of the Auto Scaling group based on the resource utilization that your tasks use from this capacity provider.

AWS Infrastructure part1 - Lambda, SQS and more

Desinging infrastructure using AWS services

sqs
In this post, I will create an HTTP API that routes to SQS then Lambda

AWS has hundreds of different products and Simple Queue Service (SQS) is one of them (one of the earliest ones that became available). I wrote unit tests for SQS based services but I figured that I did not know how they operate exactly, nor I knew the exact reason for using them. Having created test AWS account, I thought it would be great to make some examples and understand the mechanism underneath it. On top of SQS, I am also going to explore AWS Lambda and API Gateway to build a sample application.

Understanding SQS

We first need to understand why the hack we need SQS. I built couple test applications and they seem to work just fine?

sqs
Amazon Simple Queue Service is a fully managed job queue with messages stored across multiple data centers.

Why do we need SQS?

Applications nowadays are collection of distributed systems, and to efficiently pass data or logic b/w these services, you need Distributed Message Broker Service which helps establish reliable, secure, and decoupled communication. One of the best features of SQS is that it lets you transmit any volume of data, at any level of throughput, which gets dynamically created and scale automatically. SQS always ensures that your message is delivered at least once, and also allows multiple consumers and senders to communicate using the same Message Queue. By default it offers:

  • Standard Queues: give the highest throughput, best-effort ordering, and at least one delivery.
  • FIFO Queues: First-In, First-Out Queues ensure that messages are processed only once, in the sequence in which they are sent.

A queue is just a temporary repository for messages awaiting processing and acts as a buffer between the component producer and the consumer. SQS provides an HTTP API over which applications can submit items into and read items out of a queue.

Use cases of SQS Service

Okay, now we know what understand what SQS is very roughly. In what kind of circumstances would you use it?

For example, let’s say you have a service where people upload photos from their mobile devices. Once the photos are uploaded your service needs to do a bunch of processing of the photos, e.g. scaling them to different sizes, applying different filters, extracting metadata. Each user’s request will be put into a queue asynchronously, waiting to be processed. Once it’s processed, completed messages are sent back. When the server dies for some reason (Services like Kubernetes will restore them), the messages will be go back to the queue and will be picked up by another server. There are things like Visibility timeout in which when the message is not deleted by the message consumer in time, the message will return to the queue. During this period, no other consumers can receive or process the message.

sqs
The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours

So reiterate, SQS does not do any load-balancing or actual processing of the requests. It just acts as a communication bridge between applications so that status of jobs can be monitored and requests are properly fulfilled without being lost.

Understanding Lambda and API Gateway

Next up is AWS Lambda and API Gateway.

Serverless cloud computing models automatically scales up your application so you no longer need to have infrastructure-related concerns about surge in requests. Within the serverless architecture, there are something called Backend as a Service (BaaS) and Function as a Service (FaaS) models.

  1. BaaS hosts and replaces a single component as a whole (e.g Firebase authentication service, AWS Amplify).
  2. FaaS is a type of service in which all features of application are deployed into individual single feature and then each feature is individually hosted by the provider (e.g AWS Lambda)

It’s not like one approach is better than the other. It’s just two different approaches for serverless architecture. AWS Lambda is a typical example of FaaS based architecture, as we have bare minimal level of a function that gets invoked based on certain predetermined events, and gets torn down as soon as they are done being processed.

How does Lambda work under the hood? A Lambda function runs inside a microVM (micro virtual machine). When an invocation is received, Lambda will launch a new microVM and load the code package in memory to serve the request. The time taken by this process is called startup time. MicroVMs are a new way of looking at virtual machines. Rather than being general purpose and providing all the potential functionality a guest operating system may require, microVMs seek to address the problems of performance and resource efficiency by specializing for specific use cases. By implementing a minimal set of features and emulated devices, microVM hypervisors can be extremely fast with low overhead. Boot times can be measured in milliseconds (as opposed to minutes for traditional virtual machines).

sqs
Memory overhead on MicroVMs can be as little as 5MB of RAM, making it possible to run thousands of microVMs on a single bare metal server. Think of these like transistors in a PC.

How does AWS Lambda connect with API Gatway?

AWS Lambda’s functions are event driven, and API Gateway exposes REST API endpoint online which acts as a trigger to scale AWS Lambda function up and down.

2.1 - Simple example with Lambda and API Gateway

We now know have all the preliminary knowledge to code something up. Let’s go to AWS console and make a Lambda function and corresponding API Gateway hook.

lambda
Choose Pyhon Runtime on Lambda before making the API gateway trigger
gateway
Choose Lambda as the Integration module for the API gateway

Docker part 3 - Going Beyond, Kubernetes

Improving the microservice applications using Kubernetes

How can you improve? Kubernetes

In the previous post, I created a simple service that summarizes news contents using ML model, Streamlit and Docker Compose. Well there must be ways to improve the application, as that application was mostly built on top of what I already knew about Docker. What are the next step for improving the application? Well the direction is quite clear, as there is an evident limitations with the current approach of using Docker Compose. It works for testing locally on my computer, but it’s not scalable in the production, as it’s meant to run on a single host. To summarize:

  • You can implement health checks, but it won’t recreate containers when it fail (absence of self-healing)
  • Absence of proper load balancer
  • Docker Compose is designed to run on a single host or cluster, while Kubernetes is more agile in incorporating multiple cloud environments and clusters.
  • Kubernetes is easier to scale beyond a certain point plus you can utilize native services of AWS ,Azure and GCP to support our deployment.

Docker Swarm and Kubernetes are two possible options, but for scalability and compatibility, Kubernetes should be pursued.

Understanding Kubernetes

Kubernetes archietecture can be simplified to below image:

simple as this

Containers: Package everything in Containers so that Kubernetes can run it as a service.

Node: It is a representation of a single machine in your cluster. In most production systems, a node will likely be either a physical machine in a datacenter, or virtual machine hosted on a cloud provider like GCP and AWS.

Cluster: A grouping of multiple nodes in a Kubernetes environment. Kubernetes runs orchestration on clusters to control/scale nodes and pods.

Pods: Kubernetes doesn’t run containers directly; instead it wraps one or more containers into a higher-level structure called a pod. Any containers in the same pod will share the same resources and local network. Containers can easily communicate with other containers in the same pod as though they were on the same machine while maintaining a degree of isolation from others. Because pods are scaled up and down as a unit, all containers in a pod must scale together, regardless of their individual needs. This leads to wasted resources and an expensive bill. To resolve this, pods should remain as small as possible, typically holding only a main process and its tightly-coupled helper containers

Understanding workloads

You have your applications running from Dockerfiles. In Kubernetes, these applications are referred as workload. Whether your workload is a single component or several that work together, on Kubernetes you run it inside a set of pods.

Deployment

Deployment is the most common kind of Workloads

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

You can define the behaviours of individual pods, but often it’s better to define them as deployment. The Deployment creates a ReplicaSet that creates three replicated Pods, indicated by the .spec.replicas field. You can choose to declare ReplicaSet directly, but Deployment is higher-level concept, and is often more preferred way to define things.

Jobs

There are many kinds of workloads besides Deployment. Another useful type would be Jobs. A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. You can also use a Job to run multiple Pods in parallel. By default, each pod failure is counted towards the .spec.backoffLimit.

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

Code above will run a program that counts in parallel, with failure limit up to 4. .spec.parallelism controls parallel computation features.

CronJob

Cronjob is another useful workload type, performing regular scheduled actions such as backups, report generation, and so on. You can use these to perform ETL, and choose to make it more robust with tools like Airflow

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "* * * * *" # refer to the website for controlling the schedule parameters.
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox:1.28
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Understanding services

Service is an essential part of Kubernetes.

When using a Kubernetes service, each pod is assigned an IP address. As this address may not be directly knowable, the Service API lets you expose an application running in Pods to be reachable from outside your cluster, so it’s an abstraction to help you expose groups of Pods over a network. Each Service object defines a logical set of endpoints (usually these endpoints are Pods) along with a policy about how to make those pods accessible.

Ingress is not actually a type of service. Instead, it is an entry point that sits in front of multiple services in the cluster. It can be defined as a collection of routing rules that govern how external users access services running inside a Kubernetes cluster. Ingress is most useful if you want to expose multiple services under the same IP address. Powerful way to handle things, but you need to set up Ingress Controllers (multiple different providers, can be difficult).

The way you define service is similar to workloads.

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app.kubernetes.io/name: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

You define the communication protocol, and port range that it would be using to communicate.

And when defining manifest file, You can, and you should bind resources and services together:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app.kubernetes.io/name: proxy
spec:
  containers:
  - name: nginx
    image: nginx:stable
    ports:
      - containerPort: 80
        name: http-web-svc # define the name 

---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app.kubernetes.io/name: proxy
  ports:
  - name: name-of-service-port
    protocol: TCP
    port: 80
    targetPort: http-web-svc # target is the pod defined above.

A Service also provides load balancing. Clients call a single, stable IP address, and their requests are balanced across the Pods that are members of the Service. There are five general types of service for loadbalancing.

  • ClusterIP
  • NodePort
  • LoadBalancer
  • ExternalName
  • Headless

understand

ClusterIP

Internal clients send requests to a stable internal IP address. This is often the default option.


apiVersion: v1
kind: Service
metadata:
  name: my-cip-service
spec:
  selector:
    app: metrics
    department: sales
  type: ClusterIP
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Kubernetes will assign a cluster-internal IP address to ClusterIP service. This makes the service only reachable within the cluster. ClusterIP is most typically used for Inter-service communication within the cluster. For example, communication between the front-end and back-end components of your app.

Nodeport

NodePort service is the most primitive way to get external traffic directly to your service. NodePort, as the name implies, opens a specific port on all the Nodes (the VMs), and any traffic that is sent to this port is forwarded to the service. There are limitations like — only one service per port, but this gives a lot of room for customization.

LoadBalancer

LoadBalancer Clients send requests to the IP address of a network load balancer, and load balancer decides how to direct the clients to which services. LoadBalancer service is an extension of NodePort service, and each cloud provider (AWS, Azure, GCP, etc) has its own native load balancer implementation. When you are using a cloud provider to host your Kubernetes cluster, you would be definitely using LoadBalancer. There are many different types of LoadBalancers, and we will be learning about these in the future.

Building my own examples

Well, that is enough basic knowledge to get started. Now is the time to get hands dirty with our own examples. In the previous two series of the posts, I made an examples using NLP models. But this isn’t really related to my profession, as I work in the field of Computer Vision.

So in this example, I am going to:

  • Train a small image classifier that can run Inference on CPU
  • Serve the models using FastAPI, just as how I served the NLP news models.
  • Scale up the workloads using Kubernetes
  • Use Minikube instead of deploying on the cloud for testing first.
  • Later the same architecture will be hosted in the cloud (AWS)

What is Minikube?

Minikube is most typically used for testing Kubernetes. A Kubernetes cluster can be deployed on either physical or virtual machines (Nodes). To get started with Kubernetes development, you can use Minikube. Minikube is a lightweight Kubernetes implementation that creates a VM on your local machine and deploys a simple cluster containing only one node. Minikube is available for Linux, macOS, and Windows systems. The Minikube CLI provides basic bootstrapping operations for working with your cluster, including start, stop, status, and delete.

Assuming you have Linux,

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start
minikube stop

Docker part 2 - Implementing monitoring to improve applications

Improving the microservice applications

Monitoring the containers

It is important to be able to monitor the containers. Containerized apps are more dynamic–they may run across dozens or hundreds of containers that are shortlived and are created or removed by the container platform. You need a monitoring approach that is container-aware, with tools that can plug into the container platform for discovery and find all the running applications without a static list of container IP addresses. Unfortunately there isn’t one silverbullet framework that can handle all the monitoring by itself (at least those that are open source), but by combining multipe different frameworks, you can create very solid monitoring environment. Monitoring becomes increasingly important when you try later to scale the app using Kubernetes.

A.Prometheus

osi
Think of Prometheus as the heart of application monitoring

Prometheus is an open-source systems monitoring and alerting toolkit, which itself is a monitoring tool. Prometheus is a self-hosted set of tools which collectively provide metrics storage, aggregation, visualization and alerting. Think about it as time series database that you can query.

B.Grafana

osi
Prometheus metrics are not really human friendly. We need Grafana to visualize it. Think of it as Matplotlib.

Grafana While Prometheus provides a well-rounded monitoring solution, we need dashboards to visualize what’s going on within our cluster. That’s where Grafana comes in. Grafana is a fantastic tool that can create outstanding visualizations. Grafana itself can’t store data, but you can hook it up to various sources to pull metrics from it, including Prometheus.

C. Alert manager

osi
Have you seen automated slack messages generated regarding your server? well that is all coming from this.

(This is optional)

Prometheus has got first-class support for alerting using AlertManager. With AlertManager, you can send notifications via Slack, email, PagerDuty, and tons of other mediums when certain triggers go off.

D. Node Exporter

osi
It exposes stats like cpu, diskstats, and etc.

Node exporter, which as the name suggests will be responsible for exporting hardware and OS metrics exposed by our Linux host. It enables you to measure various machine resources such as memory, disk and CPU utilization. The Node Exporter is a project that is maintained through the Prometheus project. This is a completely optional step and can be skipped if you do not wish to gather system metrics. Think of this as a module to gather more data. The node_exporter is designed to monitor the host system.

E. CAdvisor

osi
cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers.

Node Exporter is for exporting local system metrics. For Docker (and Kubernetes), you will need cAdvisor. Essentially does the same thing as Node Exporter, but scrapes data for individual containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers. Specifically, for each container it keeps resource isolation parameters, historical resource usage, histograms of complete historical resource usage and network statistics. This data is exported by container and machine-wide.

Changing Docker Compose for Monitoring

Now that we know what each components do, it is time to make the corresponding changes. The code can be found here. Note that quite a few files that needs to be changed, and it takes sometime to digest what is going on. First of all, let’s see the whole Docker Compose script. You will realize that it became much longer, so let’s break it down components by components.

# docker-compose.yml
version: '3.8'
networks: 
  monitoring: 
    driver: bridge 

# Persist data from Prometheus and Grafana with Docker volumes
volumes:
  prometheus_data: {}
  grafana_data: {}

x-logging:
  &default-logging
  driver: "json-file"
  options:
    max-size: "1m"
    max-file: "1"
    tag: ""

First of all, a bridge network is a Link Layer device which forwards traffic between network segments. A bridge can be a hardware device or a software device running within a host machine’s kernel. Docker Compose understands the idea behind running services for one application on one network. When you deploy an app using Docker Compose file, even when there’s no mention of specific networking parameters, Docker Compose will create a new bridge network and deploy the container over that network. Because they all belong to the same network, all our modules can easily talk to each other. There are other network modes like overlay used for Swarm, but we are mainly interested in bridge. Here, a persistent volume for both Prometheus and Grafana is defined, and also some logging components are initialized.

Configuring Grafana

Next up, we define our first service, Grafana, the visualization dashboard.

services:
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_USERS_DEFAULT_THEME=light
      - GF_LOG_MODE=console
      - GF_LOG_LEVEL=critical
      - GF_PANELS_ENABLE_ALPHA=true
    volumes:
      - ./configs/grafana/provisioning/dashboards.yml:/etc/grafana/provisioning/dashboards/provisioning-dashboards.yaml:ro
      - ./configs/grafana/provisioning/datasources.yml:/etc/grafana/provisioning/datasources/provisioning-datasources.yaml:ro
      - ./dashboards/node-metrics.json:/var/lib/grafana/dashboards/node-metrics.json:ro
      - ./dashboards/container-metrics.json:/var/lib/grafana/dashboards/container-metrics.json:ro
      - grafana_data:/var/lib/grafana
    depends_on:
      - prometheus
    ports:
      - 3000:3000
    networks:
      - monitoring
    logging: *default-logging

For Grafana, I am defining some config files for dashboard and datasources, which will be used by Grafana to display contents. These, however, do not have to be populated prior to provisioning. They can be generated after dashboard being created, which will be discussed later more in detail. And obviously the data being rendered by Grafana is all from Prometheus, so it would naturally depend on it. The convention is to use port 3000 for Grafana.

Configuring Prometheus

Next up is the section for Prometheus. This is probably the most important component, as it scraps data from various components.

prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--log.level=error'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=7d'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
      - '--web.external-url=http://localhost:9090'
    volumes:
      - ./configs/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./configs/prometheus/recording-rules.yml:/etc/prometheus/recording-rules.yml
      - ./configs/prometheus/alerting-rules.yml:/etc/prometheus/alerting-rules.yml
      - prometheus_data:/prometheus
    depends_on:
      - alertmanager
    ports:
      - 9090:9090
    networks:
      - monitoring
    logging: *default-logging

The most important part of this section is that I am referring to a prometheus.yml file, and this gets automatically inserted into the volume of the container. This file sets rules for where Prometheus should be looking for the data.

global:
  scrape_interval: 5s
  external_labels:
    namespace: local

rule_files:
  - /etc/prometheus/recording-rules.yml
  - /etc/prometheus/alerting-rules.yml

alerting:
  alertmanagers:
    - scheme: http
      static_configs:
        - targets: ['alertmanager:9093']

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: [ 'localhost:9090' ]
        labels:
          container: 'prometheus'

  - job_name: alertmanager
    static_configs:
      - targets: [ 'alertmanager:9093' ]
        labels:
          container: 'alertmanager'

  - job_name: node-exporter
    static_configs:
      - targets: [ 'node-exporter:9100' ]
        labels:
          container: 'node-exporter'

  - job_name: cadvisor
    static_configs:
      - targets: [ 'cadvisor:8080' ]
        labels:
          container: 'cadvisor'

  - job_name: backend
    static_configs:
      - targets: [ 'backend:5000' ]
        labels:
          container: 'backend'

You can feed in IPs, or name of the component as targets.

Configuring CAdvisor

Next up is CAdvisor and Redis service. CAdvisor requires Redis server to be initialized. CAdvisor will gather container metrics from this container automatically, i.e. without any further configuration.

cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - monitoring
    logging: *default-logging

redis:
    image: redis:latest
    container_name: redis
    ports:
      - 6379:6379
    cpus: 0.5
    mem_limit: 512m
    logging: *default-logging

Configuring Node Exporter

Finally a Node-exporter and Alert Manager. Alert Manager is optional, and let’s dig deeper into customization later in the future.


node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.ignored-mount-points'
      - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
    networks:
      - monitoring
    logging: *default-logging

  alertmanager:
    image: prom/alertmanager:${ALERTMANAGER_VERSION:-v0.25.0}
    container_name: alertmanager
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--log.level=error'
      - '--storage.path=/alertmanager'
      - '--web.external-url=http://localhost:9093'
    volumes:
      - ./configs/alertmanager/alertmanager-fallback-config.yml:/etc/alertmanager/config.yml
    ports:
      - 9093:9093
    networks:
      - monitoring
    logging: *default-logging

Grafana Dashboard

osi
Login, and change your password

Okay, finally all the components have been defined. There are quite a bit of details for each components, and understanding everything thats happening behind the scene is quite difficult. But I understand the basics now (purpose of each parts, and how they should be configured), so that’s enough to get going for now. Build with docker compose, and go to localhost:3000. At the beginning your password and id should be both admin. Add sources, and based on the source, you can create any dashboard. You can look for various dashboard options here.

osi
Data for our Grafana dashboard will persist, because we generated the volume for it.
osi
Metrics for containers

One thing to note is that source URL pointing at Prometheus (which scraps all the data) must be http://prometheus:9090. I tried directing to http://localhost:9090 and was getting connection error, and it took a while to figure that out.

Add Backend data to Grafana

Awesome, I see some metrics being generated on Grafana, but where are metrics regarding the backend? I did ask Prometheus to scrap data from backend, but they are missing on Grafana. Well, they are missing because I never asked backend to generate metrics in the format that Prometheus wants .

Prometheus can collect multiple types of data, and you can find information about them here. The Prometheus client libraries offer four core metric types:

  1. Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.
  2. Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. Gauges are typically used for measured values like temperatures or current memory usage, but also “counts” that can go up and down, like the number of concurrent requests.
  3. Histogram: A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.
  4. Summary: Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.

Normally, in order to have your code generate above for metrics, you would need to use client libraries provided by Prometheus. There is a General Client library for Python as well. However, it is much easier to use a higher level library speficially designed for FastAPI, like Prometheus FastAPI Instrumentator. Update your requirements.txt to add the instrumentator. Now you need to chnage the backend code like following:

"""Simple scrapper that randomly scraps a website and gets sentences from it."""
from fastapi import FastAPI
from scrape import get_random_news
from summarize import summarize_random_news, Data
from prometheus_fastapi_instrumentator import Instrumentator
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)


@app.get("/api/v1/get/news", status_code=200)
async def get_news():
    """Simple get function that randomly fetches a news content."""
    return get_random_news()


@app.get("/api/v1/get/summary", status_code=200)
async def get_summary(data: Data):
    """Simple get function that randomly fetches a news content."""
    return summarize_random_news(data)


Instrumentator().instrument(app).expose(app)

In order to use Intrumentator, you need to add middleware to FastAPI applications. A middleware is a function that works with every request before it is processed by any specific path operation. And also with every response before returning it. CORS inside CORSMiddleware stands for Cross-Origin Resource Sharing. It allows us to relax the security applied to an API. This is enough to get basics. You can import this metrics.json file to view stats related to backend.

osi
If you trigger backend from front couple times, you will see stats being record to Grafana.

Awesome, now we are ready to start looking into Kubernetes on the next post.

Docker part 1 - Creating simple application with Docker Compose

Basics of Distributed Computing

Docker. How much do you know?

If you are software engineer with some experience, it’s very likely that you would already have some experience with Docker. Distributed systems and containers have taken over the world, and essentially every applications out there is containerized these days. Naturally, there are immense amount of contents associated with Docker beyond simple image build, pull, etc, and a lot of the sophisticated tools like Kubernetes, Terraform, Ansible, and etc operate on top of containers. I have used Docker here and there, and I thought it would be good to check how much I know about Docker and maybe fill up some gaps.

osi
The difference b/w Vms and Docker Containers

Docker is open source platform that offers container services. Containers are lightweight Virtual Machines (VM), and they leverage features of the host operating system to isolate processes and control the processes’ access to CPUs, memory and desk space. Unlike VMs that uses hypervisor that virtualizes physical hardware, containers virtualize the operating system (typically Linux or Windows) so each individual container contains only the application and its libraries and dependencies, allowing the applications to be ran on any operating system without conflicts.And typically applications would have more than one containers running. Application would consist multiple instances of containers (i.e Microservices) that are in charge of different aspects of the app, hosted in the virtual cloud like Elastic Container Registry (ECR), talking to each other over virtual network without being explosed to the internet.

Basics of Docker

Below is an example taken from the book Learn Docker in a Month of Lunches.

Okay now we reviewed what Docker is, and why we are using it. Let’s do the very basics according to the outline of the book. Assuming you have Docker installed in your environment already, let’s dive straight to it. I won’t even bother pulling pre-existing images and doing interactions with them, because I assume that we all already know how to do that.

#pull the docker images hosted on dockerhub already.
$ docker image pull diamol/ch03-web-ping

#run the ping application
$ docker container run -d --name web-ping diamol/ch03-web-ping

#check the logs collected by Docker
$ docker container logs web-ping

# tear down the container
$ docker rm -f web-ping

# run a version that pings to specific target, google in this case. 
$ docker container run --env TARGET=google.com diamol/ch03-web-ping

Now, to write a Dockerfile that does above, we obviously need to write a Dockerfile. We can simply create one mirrowing the Image we have pulled from DockerHub. Refer to the Dockerfile.

FROM diamol/node
ENV TARGET="blog.sixeyed.com"
ENV METHOD="HEAD"
ENV INTERVAL="3000"
WORKDIR /web-ping
COPY app.js .
CMD ["node", "/web-ping/app.js"]

This copies the app.js file, and executes the code via CMD method. You can see that app.js file is looking for environment variables.

const options = {
  hostname: process.env.TARGET,
  method: process.env.METHOD
};

Build the code, and execute the same code:

$ docker image build --tag test .
$ docker container run --env TARGET=google.com test

Very easy. Next up, you have multi-stage Dockerfile that uses Java Application.

Working with multi-stage Dockerfile

Each stage starts with a FROM instruction, and you can optionally give stages a name with the AS parameter. In this example that uses JAVA, Maven is a build automation tool used primarily for Java projects. OpenJDK is a freely distributable Java runtime and developer kit. Maven uses an XML format to describe the build, and the Maven command line is called mvn. We can package everything together like this run make process, such as compiling and running applications in one go.

osi
You don't need to know Java. The point is to see how the data flows.

First pull the base images.

$ docker pull diamol/maven
$ docker pull diamol/openjdk
FROM diamol/maven AS builder
WORKDIR /usr/src/iotd
COPY pom.xml .
RUN mvn -B dependency:go-offline
COPY . .
RUN mvn package

# app
FROM diamol/openjdk
WORKDIR /app
COPY --from=builder /usr/src/iotd/target/iotd-service-0.1.0.jar .
EXPOSE 80
ENTRYPOINT ["java", "-jar", "/app/iotd-service-0.1.0.jar"]
  1. The builder stage starts by creating a working directory in the image and then copying in the pom.xml file, which is the Maven definition of the Java build.
  2. The first RUN statement executes a Maven command, fetching all the application dependencies. This is an expensive operation, so it has its own step to
  3. Make use of Docker layer caching. If there are new dependencies, the XML file will change and the step will run. If the dependencies haven’t changed, the layer cache is used.
  4. Next the rest of the source code is copied in—COPY . . means “copy all files and directories from the location where the Docker build is running, into the working directory in the image.”
  5. mvn package compiles and packages the application. This creates Jar file.
  6. The app now copies compiled JAR file from the builder.
  7. App exposes port 80.
  8. ENTRYPOINT is alternative to CMD operation, executing compiled JAVA file.

Distributed Application: My Python Example - News Summarizer

Most applications don’t run in one single component. Even large old apps are typically built as frontend and backend components, which are separate logical layers running in physically distributed components. Docker is ideally suited to running distributed applications—from n-tier monoliths to modern microservices. Each component runs in its own lightweight container, and Docker plugs them together using standard network protocols. This involves defining Docker Compose.

One thing that I absolutely hate doing, is blindly following the textbook and running codes that someone else has written for you. This does not help learning in any ways, especially since these codes are not written in Python (Java, Javascript, etc). Best thing to do is defining your own examples. This is a very simple application, as it only has two parts to it, backend, and frontend. The codes can be found in this repository. Here is the general outline:

app/
├── config.env
├── docker-compose.yml
├── front/
│   ├── __init__.py
│   ├── Dockerfile
│   ├── main.py
│   └── requirements.txt
└── webscrapper/
    ├── __init__.py
    ├── Dockerfile
    ├── main.py
    ├── requirements.txt
    ├── scrape.py
    └── summarize.py

Backend (webscrapper)

The backend is written using FastAPI, as I have already gotten quite familar with it while reviewing RESTful Application structure There are two parts to the backend of our application:

  1. Webscrapper that randomly scraps news article and content from daily mail
  2. HuggingFace Machine learning model that summarizes news contents.

FrontEnd

And finally a Lite-frontend module generated with Streamlit. Streamlit turns simple scripts into shareable web apps literally in minutes. Of course there are certain limitations so it cannot compete against full-stack frameworks like Django, or more conventional JS frameworks like Vue and React, but for fast application development to test ideas, Streamlit is a cleaner winner.

osi
Streamlit turns simple Python codes into webapp. There are almost no learning curve, and it's one of the easiest tools out there.

The overall idea is to have a very simple frontend page that loads and shows news article on button click (which calls webscrapper backend), and another button that turns loaded news article into an input for ML summarization model, loading the summary on the frontend when the inference finishes.

News Summarizer backend code

Let’s look into the details of the backend. The very first logic is loading the news article. Python has a html parsing library called BeautifulSoup, and all I am doing is reading dailymail news page, and randomly selecting one news article to generate an input.

import requests
from bs4 import BeautifulSoup
import numpy as np
import re
import random


def get_random_news():
    url = "https://www.dailymail.co.uk"
    r1 = requests.get(url)

    coverpage = r1.content
    soup1 = BeautifulSoup(coverpage, "html5lib")
    coverpage_news = soup1.find_all("h2", class_="linkro-darkred")

    list_links = []

    # choose a random number
    n = random.randint(0, len(coverpage_news))

    final_data = {}
    list_links = []

    # Getting the link of the article
    link = url + coverpage_news[n].find("a")["href"]
    list_links.append(link)

    # Getting the title
    title = coverpage_news[n].find("a").get_text()

    # Reading the content (it is divided in paragraphs)
    article = requests.get(link)
    article_content = article.content
    soup_article = BeautifulSoup(article_content, "html5lib")
    body = soup_article.find_all("p", class_="mol-para-with-font")

    # Unifying the paragraphs
    list_paragraphs = []
    for p in np.arange(0, len(body)):
        paragraph = body[p].get_text()
        list_paragraphs.append(paragraph)
        final_article = " ".join(list_paragraphs)

    # Removing special characters
    final_article = re.sub("\\xa0", "", final_article)

    final_data["title"] = title.strip()
    final_data["content"] = final_article

    return final_data

Now that the input is ready, I need a ML model that can run inference and generate summaries. The core ML model is just generic Summarization NLP model from Hugging Face. I did not bother retraining or optimizing the model, as the purpose of this example is to practice Docker, not to create state-of-the-art model. FastAPI uses Pydantic for data validation, so I created a very basic Data class here as well.

from transformers import pipeline
import warnings
from pydantic import BaseModel

warnings.filterwarnings("ignore")

class Data(BaseModel):
    """Data model for the summary. Keep it simple."""

    content: str


def summarize_random_news(content: Data) -> dict:
    """Summarizes a random news."""
    summarizer = pipeline("summarization", model="stevhliu/my_awesome_billsum_model")
    summarized_content = summarizer(content.content)
    actual_summary = summarized_content[0]["summary_text"]

    return {"summary": actual_summary}

if __name__ == "__main__":
    summary = summarize_random_news()
    print(summary)

It is possible to put these two functions into two different services, but to keep things simple, I have decided to group them together under single backend service. And of course I need the FastAPI main script to provide endpoint for above two functions. I do not have any DB setup at the moment, so there are no POST, DELETE or PUT. There are only simple GET function at the moment for each of the function. There are no parameters for get_news as it will just generate random news input. This input is passed to get_summary, and Pydantic will validate that it is proper string.

"""Simple scrapper that randomly scraps a website and gets sentences from it."""
from fastapi import FastAPI
from scrape import get_random_news
from summarize import summarize_random_news, Data

app = FastAPI()


@app.get("/api/v1/get/news", status_code=200)
async def get_news():
    """Simple get function that randomly fetches a news content."""
    return get_random_news()


@app.get("/api/v1/get/summary", status_code=200)
async def get_summary(data: Data):
    """Simple get function that randomly fetches a news content."""
    return summarize_random_news(data)

News Summarizer frontend code

Everything in the front is defined with Streamlit. Here, I need to manage the state of the news data and change them with the buttons.

import streamlit as st
import pandas as pd
import requests

import os

# init sessionv variables
if "news" not in st.session_state:
    st.session_state.news = "Click the button to get news"

if "summary" not in st.session_state:
    st.session_state.summary = "Click the button to get summary"

Then I need equivalent frontend functions to trigger the backend function. Here, hard coding the backend endpoint is a very bad practice. What if the URL changes? then you need to change every single one of them. So sensible way to do it is to declare them inside building Docker Compose, and use things like Enviornment variables.

def display_news():
    """Called upon onclick of get_news_button"""
    if "BACKEND_PORT" not in os.environ:
        backend_port = 5000
    else:
        backend_port = str(os.getenv("BACKEND_PORT"))

    # get the news from backend
    response = requests.get(f"http://backend:{backend_port}/api/v1/get/news")

    # response successful return the data
    if response.status_code == 200:
        content = response.json()
        text = content["content"]
        title = content["title"]
        return f"{title}\n\n{text}"

    else:
        return "Port set properly, but backend did not respond with 200"


def get_summary():
    """Called upon onclick of summarize_button"""

    if st.session_state.news == "Click the button to get news":
        return "Get news first, and then ask to summarize"

    if "BACKEND_PORT" not in os.environ:
        backend_port = 5000
    else:
        backend_port = str(os.getenv("BACKEND_PORT"))

    # get the news from backend using json obj (based on Pydantic definition).
    json_obj = {"content": st.session_state.news}
    response = requests.get(
        f"http://backend:{backend_port}/api/v1/get/summary", json=json_obj
    )

    if response.status_code == 200:
        content = response.json()
        summary = content["summary"]
        return summary
    else:
        return "Port set properly, but backend did not respond with 200"

Last part is the actual front-end portion of the page that can be literally defined in few lines. Creating buttons, lines, etc is very easy. The buttons will trigger state change, and the bottom text section will display whatever the current state of the session variable is set as.

st.title("Summarize News using AI 🤖")

st.markdown(
    "Microservice test application using FastAPI (backend), HuggingFace (inference), and Streamlit(frontend). Using Docker Compose to turn into microservice. To get a news article, hit the Get News Article. To summarize the article, hit the Summarize button.",
    unsafe_allow_html=False,
    help=None,
)

col1, col2, col3, col4, col5, col6, col7, col8 = st.columns(8)

col1, col2, col3, col4 = st.columns(4)


with col1:
    pass
with col2:
    get_news_button = st.button("Get News")
with col3:
    summarize_buttom = st.button("Summarize")
with col4:
    pass

if get_news_button:
    st.session_state.news = display_news()

if summarize_buttom:
    st.session_state.summary = get_summary()

st.header("News Article", help=None)
st.markdown(st.session_state.news)
st.divider()
st.subheader("Summary")
st.markdown(st.session_state.summary)

News Summarizer Dockerfiles

And of course there is the docker compose file and dockerfiles that glues everything together, and actually run things in harmony. First of all, Docker Compose file that wraps things together. Here, I am providing ENV file so that I do not have to hardcode the port number for FRONT and BACK. In terms of functionality, defining COMMAND in Docker Compose and defining it inside Dockerfile, shows no difference. So I am defining backend command to start a backend server in the Docker Compose. Health checks are implemented to periodically check the status of the containers, note, there is limitations in terms of self-healing mechanism, which will be addressed in the next post.

# docker-compose.yml
version: '3.8'
services:
  backend:
    env_file:
      - .env
    build: ./webscrapper
    ports:
      - ${BACKEND_PORT}
    volumes:
      - ./webscrapper:/app
    healthcheck:
      test: curl --fail http://localhost:${BACKEND_PORT} || exit 1
      interval: 1m30s
      timeout: 30s
      retries: 5
      start_period: 30s
    command: uvicorn main:app --reload --host 0.0.0.0 --port ${BACKEND_PORT}
    networks:
      default:
        aliases:
            - backend

  streamlit-app:
    env_file:
      - .env
    build: ./front
    container_name: streamlitapp
    depends_on:
      - backend 
    ports:
      - ${FRONTEND_PORT}:${FRONTEND_PORT}
    working_dir: /usr/src/app
    healthcheck:
      test: curl --fail http://localhost:${BACKEND_PORT} || exit 1 
      interval: 1m30s
      timeout: 30s
      retries: 5
      start_period: 30s
  

You can see from above that I am defining the backend network alias as backend, so that I do not have to hardcode any IP addresses for the frontend to receive the data from the backend. The frontend also depends on the backend. Use depends_on to express the startup and shutdown dependencies b/w services. Frontend will always depend on other services being ready. For scaling, you need to make sure that you are not binding yourself to a specific port. Otherwise you will get message saying your port has already been allocated when you try to scale your application like below:

docker compose up -d --scale backend=3

If your ports are set up properly, you can create multiple instances of the backend like below. Again, there are limitations regarding load-balancing regarding this approach, which will be addressed in the next post.

Front Dockerfile

And of course, you need the actual Docker files for frontend and backend. I am defining frontend start command inside the Dockerfile, but this could be defined inside Docker Compose file as well.

# reduced python as base image
FROM python:3.8-slim-buster 

# set a directory for the app
WORKDIR /usr/src/app 

# copy all the files to the container
COPY . . 

# pip install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# RUN apt-get -y update; apt-get -y install curl

# expose port in the environment. 
EXPOSE ${FRONTEND_PORT} 

ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]

Backend Dockerfile

Backend is even more simple. All that is happening here is copying the files over to the container, downloading requirement files. The backend needs Pytorch, so you can technically use the Pytorch containers as well.


FROM python:3.9
# FROM nvcr.io/nvidia/pytorch:23.04-py3

WORKDIR /app

#copy to cache for faster run
COPY ./requirements.txt /code/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

# copy the everything over to container
COPY ./ /app/

Final Result

According to the front design, two buttons generate input and output. We can confirm that everything works as it should:

test

And the summary is generated perfectly as well.

test

Awesome, in the next post, I will try to improve this simple summary app.

Python Concurrency Part 1 - Multiprocesing

Note for parallel processing

Why is Pyhon so slow?

With the background in Machine Learning, the natural choice of programming language for me was Python, which was (and still is) overwhelming popular, and the most dominant language in the field of Machine Learning. I have already spent years programming in Python, and I really love the Zen of Python. Like any other programming languages, however, Python is not perfect. Python is innately slower than modern compiled languages like Go, Rust, and C/C++, because it sacrified speed for easier usage and more readable code. But why is it slow exactly?

Common characteristics of Python are:

  • Interpreted language
  • Dynamically typed
  • GC Enabled Language

If you ever studied languages like C++, you know of course that compiled languages have characteristics exactly opposite of above. C compilers like GNU will convert the source code into Machine Code, and generates executables that gets directly ran by CPU. Python, on the other hand, does not provide this compile step. The source code gets executed directly, at which Python will compile the source code into bytecode that can be read by Interpreters, where bytecode instructions are executed on a Virtual Machine, line by line. It can only be executed line by line because the language is dynamically typed. Because we don’t specify the type of a variable beforehand, we have to wait for the actual value in order to determine whether what we’re trying to do is actually legal. These errors cannot be caught during compile time, because the validity can only be checked by the Interpreter.

When learning C/C++, something that always makes you bang your head on the wall is the memory management–Notorious Segmentation Fault Error. Python has automatic garbage collecting, so you never have to worry about things like pointers, but it’s definitely not the most efficient way to handle memory. You can never shoot yourself in the foot, but it slows you down at the same time.

Global Interpreter Lock (GIL)

Another factor that slows Python down is Global Interpreter Lock (GIL). By design, Python operates on single-thread, on a single CPU. Since the GIL allows only one thread to execute at a time even in a multi-threaded architecture with more than one CPU core, you may wonder why in the world that Python Developers created this feature.

etl
Python GIL makes process thread safe, but slows things down

GIL solves a race-condition where multiple threads try to access and modify a single point of memory, which causes memory leak and also sorts of bugs that Python cannot handle. This was practical solution as Python manages memory for you, and earlier Python did not even have the concept of threading.

Languages like Rust guarantees thread safety (no multiple threads accessing same variables at the same time) using similar concepts for memory safety and provides standard library features like channels, mutex, and ARC (Atomically Reference Counted) pointers. Python’s GIL does not gaurantee thread-safety, obviously because it’s making one thread take all the resources. So is there concept of multi-threading in Python? yes of course. This will be discussed in the next section.

Multiprocessing and Threading

In the previous few sections I have discussed the limitations of Python. Some of these problems are simply impossible to overcome due to the nature of Python, but Python has made continuous improvements in many areas and thus it stands where it is right now. Many popular Python packages (and pretty much all ML libraries) are written in C to offer competitive speed, and now Python supports libraries for multi-processing and threading, which gives additional boost to the speed. This blog post will introduce various aspects of multi-processing and multi-threading.

Many people misleadingly use multiprocessing and threading interchangeably, whereas there are not actually the same.

etl
Threading vs Multiprocessing

Above illustration is a very nice overview of the difference b/w threading and multiprocessing.

First of all, threading is a great way to speed up application to tackle I/O bound problems, where threads within a single process run parallely. For example, say you have a multi-stage pipeline in your application where user interacts with the server. If there is a significant bottleneck in one of the stages, and your process is only running sequentially, it will take much longer time to finish all the process since the bottleneck process needs to be completed before moving on to the next stage. However, if the process is running concurrently, the entire application should not take much longer than the time required to execute the bottleneck process.

This is why we have Multi-processors. When the program is CPU intensive and doesn’t have to do any IO or user interaction, you can use multiprocessors to exponentially speed up the computation by breaking down the problem into chunks, and delegating each cores to handle the chunks. More processes can be created if your computer has more cores. It becomes less scalable when your spin up more processes than the number of cores in your computer.

Python Multiprocessing

Now that we have covered the basics, it’s time to dive in. In this post, I will cover multiprocessing library, which is the core Python modules for multiprocessing. I have already used multiprocessing to some extent for work (for database migration), so it would be easier to start from something that I am familiar with. Multiprocessing library offers four major parts:

  1. Process
  2. Locks
  3. Queues
  4. Pools
4 pillars
Process, Locks, Queues and Pools are the four major pillars of mp.

Process

Process is the most basic unit of execution in Python. Every process has its copy of the Python interpreter and its memory space, allowing multiple jobs to get executed simultaneously without conflicts. There are two types of processes:

  1. Parent Process: The main process that can start another child process.
  2. Child process: Process created by another process. Also called subprocess. It can only have one parent process, like how humans can only have one biological paraents. A child process may become orphaned if the parent process gets terminated.

The multiprocessing.Process instance then provides a Python-based reference to this underlying native process. Let’s try creating two processes that starts a function that sleeps.

import time
import multiprocessing as mp

def sleep(sleep_sec=1):
    process = mp.current_process()
    pid = process.pid
    print(f'process {pid}: Sleeping for {sleep_sec} seconds \n')
    time.sleep(sleep_sec)
    print(f'process {pid} Finished sleeping \n')

# create two process
start_time = time.perf_counter()
p1 = mp.Process(target=sleep)
p2 = mp.Process(target=sleep)

# Starts both processes
p1.start()
p2.start()

# start both process
finish_time = time.perf_counter()

# This must be called from the parent process
if mp.current_process().name == 'MainProcess':
    print(f"Program finished in {(finish_time - start_time):.3f} seconds")

But if you look at the output, you will notice that it is a bit funny. One would expect print(sleep) and print(finish) to execute first before the final “Program finished” call to be made on the Parent Process. But instead, you get:

process 2354: Sleeping for 1 seconds 
Program finished in 0.006 seconds
process 2357: Sleeping for 1 seconds 
process 2354 Finished sleeping 
process 2357 Finished sleeping

Why is this happening?

4 pillars
calling join() function explictly blocks and waits for other processes to terminate

Yes, it’s because the processes are executed without joining, so the main process that only takes 0.02 seconds to execute finished before waiting for p1 and p2 to finish it’s parts. So if you fix this up,


import time
import multiprocessing as mp

# create two process
start_time = time.perf_counter()

# save all the process in a list
processes = []

#create 10 processes that all sleeps
for i in range(3):
    p = mp.Process(target=sleep)
    p.start()
    processes.append(p)

#join all the process
for p in processes:
    p.join()

finish_time = time.perf_counter()

# This must be called from the parent process
if mp.current_process().name == 'MainProcess':
    print(f"Program finished in {(finish_time - start_time):.3f} seconds")

And now you properly get:

process 6950: Sleeping for 1 seconds 
process 6953: Sleeping for 1 seconds 
process 6958: Sleeping for 1 seconds 
process 6950 Finished sleeping 
process 6953 Finished sleeping 
process 6958 Finished sleeping 
Program finished in 1.017 second

Okay, now we understand how Process are started and joined. But one thing to note, is that there are actually 3 different methods for start.

  1. Spawn: Start a new Python Process
  2. fork: Copy a Python process from existing process
  3. forkserver : A new process from which future forked processes will be copied.

You can set specific start method like

if __name__ == '__main__':
	# set the start method
	multiprocessing.set_start_method('spawn')

This isn’t too important for now, so further details are not going to be discussed.

Locks

Next up is the Lock. We learned that process is encapsulated Python program. These processes often share data or resources, and mutual exlcusion lock (Mutex) protects shared resources and prevents race conditions, as race conditions can easily corrupt data and create unwated results. Lock ensures that data is consistent b/w jobs, by preventing another process from accessing the data until the its released.

Let’s look into multiprocessing.Lock class. There are two states: Locked and Unlocked

# create a lock
lock = multiprocessing.Lock()
# acquire the lock
lock.acquire()
# release the lock
lock.release()

Lock is more useuful with the queues. So let’s discuss queues now.

Queues

This is the same queue in the context of data structure. But here we are specifically referring to a first-in, first out FIFO queue, in the context of multiprocessing. Data can be placed to a queue, and processed by another processor when it becomes available, allowing us to break up tasks into smaller parts that can be processed simultaneously.

Pools

Fundamentals of Machine Learning Part 1: Convolution

Learning fundamental components of neural network in depth

History of Convolutional neural network

Neural networks have been around for very long time, with multi-layer perceptrons (MLP) being introduced by Marvin Minsky and Seymour Papert in 1969, and the gradient descent based backpropagation being published even before that. The potential of neural network, however, has only been truly realized in the past decade, with the astonishing advancement of computers and the datasets associated with it. In terms of 2D image recognition, the famous ImageNet dataset was released in 2006, and the major breakthrough happened in 2012, when a convolutional neural network (CNN) designed by Alex Krizhevsky, published with Ilya Sutskever and Krizhevsky’s PhD advisor Hinton, halved the existing error rate on Imagenet visual recognition to 15.3 percent. The CNN was dubbed “AlexNet” in this legendary paper.

osi
The success of AlexNet changed the landscape of the Image recognition

The concept of ConvNets were first introduced in 1980s, but AlexNet was the first official discovery that deeply stacked ConvNets could be so effective in the image related tasks. It’s been 10 years since the AlexNet paper was first released, and nowadays trasformer models are being adopted to vision related tasks too, yet CNN still remains as the mainstream due to its incredible efficiency.

Understanding CNN

When coding neural network, we use CNN blocks all the time. But it is extremely important to understand the mathemetics and the reason why they really work in practice. \[\left(\int\mathrm{\bf:}\ g\right)\bigl({\cal t}\bigr)\longrightarrow\int_{-\infty}^{\infty}\int\!\!\left(T\right)g\bigl({\cal t}\longrightarrow T\bigr)d\tau\]

Pagination