In the previous post, I have examined the continuous learning aspects of the culling pipeline. But the most important logic of the pipeline is the Image Quality Assessment (IQA) portion of the pipeline, which quantifies the actual quality of the images, based on characteristics like noise level, closed eyes, lighting, and various other composition metrics.
There are many things to consider when we want to formulate IQA in the image culling settings. Do we want to be able to measure noise level, all the metrics seperately, or single unified quality score? Do we need to create ensemble models? How are we going to generate the test datasets? But before any of above complex ideas are addressed, it is important to see how IQA is done generally.
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
This is the paper that I came acorss at the CVPR 2023 conference.
Processing of 2D images are one of the most robust fields within Machine Learning due to high availability of the data. Image quality assessment (IQA) for 2D images is an important area within 2D image processing, and it has many practical applications, one being Photo culling. Photo Culling is a tedious, time-consuming process through which photographers select the best few images from a photo shoot for editing and finally delivering to clients. Photographers take thousands of photo in a single event, and they often manually screen photos one by one, spending hours of effort. Criteria that photographers look for is:
Closed eyes
Noise level
Near duplicate images
Proper light
Rule of thirds
Poses, emotion
Machine learning and IQA
IQA is definitely one of the areas where Machine Learning algorithms can excel at. However, there are many criterias (defined above) that it must fulfill, and the biggest hurdle is that because every photographers have different preferences, you cannot expect one pretrained model to satisfy all the end users. We should implement Continual Learning (CL) process, so that we do not have to retrain entire pipeline of models from scratch. When the end user performs his own version of the culling process based on the suggestions of the pretrained model, this should become the new stream of data for CL training. The original IQA model should be tuned according to this new data, with the aim to minimize training time and the catastrophic forgetting, where the model forgets what it learned before the CL process began.
Now, we need to think about the architecture of this entire pipeline. First, it may be difficult to combine IQA and near-duplicate detevction (NDD) models together. Intuitively, a NDD model should break down the images into N set of images, and within each set, IQA will determine the quality of images. And there must be CL algorithm on top of everything. Ideally, NDD and IQA both should be continuously trainable, if not, at least the IQA portion of the pipeline should be trainable.
Another thing to consider is EXIF data. EXIF (Exchangeable Image File Format) files store important data about photographs. Almost all digital cameras create these data files each time you snap a new picture. An EXIF file holds all the information about the image itself — such as the exposure level, where you took the photo, and any settings you used. Apart from the photo itself, EXIF data could have important details for NDD and IQA. We must see if there is any improvement in the result when we incorporate the EXIF data. It could be totally unnecessary, just making network heavy. But it could prove to be useful as well.
Now, in terms of the representation of the architecture, something like below could make sense:
NDD and IQA, I do already have some idea as to how they would work even before doing proper research, but the idea of continuous learning is very vague to me. How are these supposed to be implemented? Does it work on any architecture? What are the limitations?
I feel like it would only make sense for me to look into IQA and NDD closely after having better grasp of CL. Therefore, for the first part of the post, I will checkout Continous Learning.
Lifelong Learning (LL)
If you think about how humans learn, We always retain the knowledge learned in the past and use it to help future learning and problem solving. When faced with a new problem or a new environment, we can adapt our past knowledge to deal with the new situation and also learn from it. The author of Lifelong Machine Learning defines the process to imitate this human learning process and capability as Lifelong learning (LL), and states terms like Online learning and continual learning are subsets of LL.
Online Learning overview
What is Online Learning exactly? Online learning (OL), is a learning paradigm where the training data points arrive in a sequential order. This could be because the entire data is not available yet, due to various reasons like annotations being too expensive, or for cases like us, where we need the users to generate these data.
When a new data point arrives, the existing model is quickly updated to produce the best model so far. In online learning, if whenever a new data point arrives re-training using all the available data is performed, it will be too expensive. It becomes impossible when the model size is big. Furthermore, during re-training, the model being used could already out of date. Thus, online learning methods are typically memory and run-time efficient due to the latency requirement in a real-world scenario.
Much of the online learning research focuses on one domain/task. Objective is to learn more efficiently with the data arriving incrementally. LL, on the other hand, aims to learn from a sequence of different tasks, retain the knowledge learned so far, and use the knowledge to help future task learning. Online learning does not do any of these
Continual learning overview
The concept of OL is simple. The sequence of data gets unlocked in every stage, and the training is performed sequentially to make model more robust. Continual learning is also quite similar, but it is more sophisticated. It needs to learn various new tasks incrementally while sharing paramters with old ones, without forgetting.
Continuous learning is generally divided into three categories:
Task Incremental learning
Let’s look at the first scenario. This scenario is best described as the case where an algorithm must incrementally learn a set of distinct tasks. Often times, this task identity is explicit, and it does not require seperate network. If the network requires completely different task (e.g Classification vs Object Detection), it would require seperate network and weights, And there will be no forgetting at all
The challenge with task-incremental learning, therefore, is not—or should not be—to simply prevent catastrophic forgetting, but rather to find effective ways to share learned representations across tasks, to optimize the trade-off between performance and computational complexity and to use information learned in one task to improve performance on other tasks
Now assume this scenario for MNIST:
So the objective here would be to ensure that learning Task N would make learning of Task N+1 easier, and for above example, even after learning all the way up to Task 5, when we ask to perform classification for Task 1, the model can still perform well without forgetting.
Thus when we have some Input space X and output space Y, with the task context C, we learn to perform \[f : X \times C \rightarrow Y\]
Domain Incremental learning
In this scenario, the structure of the problem is always the same, but the context or input-distribution changes.
It is similar to Task incremental learning in that every training round n, we expect the model to learn from Dataset n: \[D_n = {(x_i, y_i)}\]
But at the test time, we do not know which “task” it came from. This is probably the most relevant cases for our pipeline. The available labels (classes) will remain the same, and domain will keep on changing. We do not care or want to provide task context. Thus intuitively, this is: \[f : X \rightarrow Y\]
Class Incremental learning
The final scenario is the class incremental learning. This is more popular field of study within continual learning, and most continual learning papers at CVPR were targeted for class incremental learning as well, because this is the area where often times the catastrophic forgetting (CF) happens most significantly. This scenario is best described as the case where an algorithm must incrementally learn to discriminate between a growing number of objects or classes.
Data arrives incrementally as a batch of per-class sets X i.e. (\(X\), \(X^2\), …, \(X^t\) ), where \(X^y\) contains all images from class y. Each round of learning from one batch of classes is considered Task \(T\). At each step \(T\), complete data is only available for new classes \(X\), and the only small number previously learned classes are available as memory buffer. If not using rehearsal technique, the memory buffer is not even available. The algorithm must map out global label space, and at testing time, it would not have any idea of which step the data came from.
And this is out of scope of this research. Why?
The new data that users will be adding over time will span across multiple classes, and a lot of them will be from previously seen classes. The strict protocol of class-incremental learning would not make sense.
Single head vs Multi-head evaluation
The system should be able to differentiate the tasks and achieve successful intertask classification without the prior knowledge of the task identifier (i.e. which task current data belongs to). In the case of single-head, the neural network output should consist of all the classes seen so far, so the output should be evaluated in the global scale. In contrast, multi-head evaluation only deals with intra-task classification where the network output only consists of a subset of all the classes.
For MNIST, say that the dataset is broken down by classes into 5 tasks, where we have [{0,1}....{8,9}]
In multi-head setting, we would know the ask number (let’s say Task 5), and we would only be making prediction b/w the subset {8,9}. Typically Task-incremental learning will fall under this.
In single-head setting, evaluation will be down across all ten classes, so {0…9}. Typically domain incremental and class incremental learning will fall under this.
Types of OL/CL algorithm
My conclusion after research is that although some authors make clear distinctions b/w Life long learning, Online learning and Continual Learning as most use them interchaneably. Regardless of being OL or CL, these properties are shared:
Online model parameter adaptation: How to update the model parameters with the new data?
Concept drift: How to deal with data when the data distribution changes over time?
The stability-plasticity dilemma: If the model is too stable, it can’t learn from the new data. When the model is too plastic, it forgets knowledge learned from the old data. How to balance these two?
Adaptive model complexity and meta-parameters: The presence of concept drift or introducing a new class of data might necessitate updating and increasing model complexity against limited system resources
So the main challenge, as everyone agrees, is to make model plastic enough to learn due data, but stable enough to not forget the previously gained knowledge. Finding this balance is the key to online learning. Now we look at some of the methods.
Replay methods
Replay methods are based on repeatedly exposing the model to the new data and data on which it has been already trained on. New data is combined with sampled old data, and can be used for rehearsal (thus retaining training data of old sub-tasks) or as constraint generators. If the retained samples are used for rehearsal, they are trained together with samples of the current sub-task. If not, they are used directly to characterize valid gradients.
The main differences b/w replay methods exists in the following:
How examples are picked and stored
How examples are replayed
How examples update the weights
Drawbacks
The storage cost would go up linearly with the tasks. Yes, we are sampling from older data, but as round progresses, the storage of older data would be required to increase.
Parameter isolation methods try to assign certain neurons or network resources to a single sub-task. Dynamic architectures row, e.g., with the number of tasks, so for each new sub-task new neurons are added to the network, which do not interfere with previously learned representations. This is not relevant to us since we do not want to add additional tasks to the network.
Regularization methods
Regularization methods work by adding additional terms to the loss function. Parameter regularization methods study the effect of neural network weight changes on task losses and limit the movement of important ones, so that you do not deviate too much from what you learned previously.
Originally designed to transfer the learned knowledge of larger network (teacher) to a smaller one (student), knowledge distillation methods have been adapted to reduce activation and feature drift in continual learning. Unlike regularization methods that regularize the weights, KD methods regularize the network intermediate outputs.
Now that we have some general idea of how incremental learning works, let’s look at a paper in depth. Learning without Forgetting released in 2017 is probably one of the earliest and most famous work done in this field. This may not be perfectly suitable for our scenario where label space b/w previous task overlaps and may or may not contain novel class, but it is still important in the sense that you can understand general mechanism.
SequeL
This is based on this CVPR paper. SequeL is a flexible and extensible library for Continual Learning that supports both PyTorch and JAX frameworks. SequeL provides a unified interface for a wide range of Continual Learning algorithms, including regularization- based approaches, replay-based approaches, and hybrid approaches.
It’s time to take things one step further, and explore the features of Terraform to learn about Infrastructure as a Code (IAC). But before we do so, let’s take a look at the components that will be the main building blocks of this tutorial: AMI and ECS.
If we want to launch virtual machine instances from a template, the AMI is the blueprint configuration for the instances, in fact a backup of entire EC2 instance. When one of your machine dies in the pipeline, AMI can be used to quickly spin up and fill up the gap. During the AMI-creation process, Amazon EC2 creates snapshots of your instance’s root volume and any other Amazon Elastic Block Store (Amazon EBS) volumes attached to the instance. This allows you to replicate both an instance’s configuration and the state of all of the EBS volumes that are attached to that instance.
Packer
All the dependencies and relevant packages that needs to be installed to AMI needs constant updates. You do not want to create the AMI manually. You would want to be able to look for updates, and if there is, automatically create the images using tools like Packer or EC2 Image builder. Packer is better alternative when using Terraform, as it’s also built by Hashicorp, and can be used across different cloud providers.
ECS? Why not EKS?
We have the AMI ready by Packer. What is next? We need ECS to use the images to spin up virtual machines, and run containers within these machines to efficiently handle the jobs in the backend and deliver it back to the clients.
Let’s check what ECS is conceptually. AWS Elastic Container Service (ECS) is Amazon’s homegrown container orchestration platform. Container orchestration, that sounds awfully familar doesn’t it? Yes, Kubernetes. AWS has over 200 services, and there are quite a few of them that are related to containers. Why not use Elastic Kubernetes Service or AWS Fargate?
EKS
As we explored in the previous tutorial, Kubernetes (K8s) is a powerful tool for container orchestration, and is the most prevalent options in the DevOps community. Amazon Elastic Kubernetes Service (EKS) is Amazon’s fully managed Kubernetes-as-a-Service platform that enables you to deploy, run, and manage Kubernetes applications, containers, and workloads in the AWS public cloud or on-premise.
Using EKS, you don’t have to install, run, or configure Kubernetes yourself. You just need to set up and link worker nodes to Amazon EKS endpoints. Everything is scaled automatically based on your configuration, deploying EC2 machines and Fargate containers based on your layout.
ECS
What about ECS? ECS is a fully managed container service that enables you to focus on improving your code and service delivery instead of building, scaling, and maintaining your own Kubernetes cluster management infrastructure. Kubernetes can be very handy, but difficult to coordinate as it requires additional layers of logic. This ultimately reduces management overhead and scaling complexities, while still offering some degree of benefits in terms of container orchesteration.
ECS is simpler to use, requires less expertise and management overhead, and can be more cost-effective. EKS can provide higher availability for your service, at the expense of additional cost and complexity. Thus my conclusion is–setting things like Kubernetes control plane feature aside–if you are not dealing with a service where availability is the most crucial aspect, and the cost matters more, ECS should be a preferred option.
ECS more in depth
Okay now it makes sense why we want to go for ECS for now to get things up to speed simply with Docker container images. Let’s dig deeper into various aspects of ECS. An important concepts to understand is Cluster, Task, and Service within ECS.
Cluster
ECS runs your containers on a cluster of Amazon EC2 (Elastic Compute Cloud) virtual machine instances pre-installed with Docker. In our cases, we might be asking to use ECS GPU optimized AMI in order to run deep-learning related applications, like the Nvidia Triton Server. An Amazon ECS cluster groups together tasks, and services, and allows for shared capacity and common configurations. These statistics can be integrated and monitored closely on AWS CloudWatch.
A Cluster can run many Services. If you have multiple applications as part of your product, you may wish to put several of them on one Cluster. This makes more efficient use of the resources available and minimizes setup time.
Task
This is the blueprint describing which Docker containers to run and represents your application. In our example, it would be two containers. would detail the images to use, the CPU and memory to allocate, environment variables, ports to expose, and how the containers interact.
A Service is used to guarantee that you always have some number of Tasks running at all times. If a Task’s container exits due to an error, or the underlying EC2 instance fails and is replaced, the ECS Service will replace the failed Task. This is why we create Clusters so that the Service has plenty of resources in terms of CPU, Memory and Network ports to use
Here is sample of ECS service defined in Terraform for MongoDB.
Okay, Task is our application and service ensures our Tasks are running properly. When tasks comes in, how do we know which ECS instance, or which Task should the request be allocated in order to maximize the efficiency?
When the virtual machines are created, they are registered to the cluster. Each cluster will continuously monitor the usage, checking the overall usage. Amazon ECS can manage the scaling of Amazon EC2 instances that are registered to your cluster. This is referred to as Amazon ECS cluster auto scaling.
When you use an Auto Scaling group capacity provider with managed scaling turned on, you set a target percentage (the targetCapacity) for the utilization of the instances in this Auto Scaling group. Amazon ECS then manages the scale-in and scale-out actions of the Auto Scaling group based on the resource utilization that your tasks use from this capacity provider.
AWS has hundreds of different products and Simple Queue Service (SQS) is one of them (one of the earliest ones that became available). I wrote unit tests for SQS based services but I figured that I did not know how they operate exactly, nor I knew the exact reason for using them. Having created test AWS account, I thought it would be great to make some examples and understand the mechanism underneath it. On top of SQS, I am also going to explore AWS Lambda and API Gateway to build a sample application.
Understanding SQS
We first need to understand why the hack we need SQS. I built couple test applications and they seem to work just fine?
Why do we need SQS?
Applications nowadays are collection of distributed systems, and to efficiently pass data or logic b/w these services, you need Distributed Message Broker Service which helps establish reliable, secure, and decoupled communication. One of the best features of SQS is that it lets you transmit any volume of data, at any level of throughput, which gets dynamically created and scale automatically. SQS always ensures that your message is delivered at least once, and also allows multiple consumers and senders to communicate using the same Message Queue. By default it offers:
Standard Queues: give the highest throughput, best-effort ordering, and at least one delivery.
FIFO Queues: First-In, First-Out Queues ensure that messages are processed only once, in the sequence in which they are sent.
A queue is just a temporary repository for messages awaiting processing and acts as a buffer between the component producer and the consumer. SQS provides an HTTP API over which applications can submit items into and read items out of a queue.
Use cases of SQS Service
Okay, now we know what understand what SQS is very roughly. In what kind of circumstances would you use it?
For example, let’s say you have a service where people upload photos from their mobile devices. Once the photos are uploaded your service needs to do a bunch of processing of the photos, e.g. scaling them to different sizes, applying different filters, extracting metadata. Each user’s request will be put into a queue asynchronously, waiting to be processed. Once it’s processed, completed messages are sent back. When the server dies for some reason (Services like Kubernetes will restore them), the messages will be go back to the queue and will be picked up by another server. There are things like Visibility timeout in which when the message is not deleted by the message consumer in time, the message will return to the queue. During this period, no other consumers can receive or process the message.
So reiterate, SQS does not do any load-balancing or actual processing of the requests. It just acts as a communication bridge between applications so that status of jobs can be monitored and requests are properly fulfilled without being lost.
Understanding Lambda and API Gateway
Next up is AWS Lambda and API Gateway.
Serverless cloud computing models automatically scales up your application so you no longer need to have infrastructure-related concerns about surge in requests. Within the serverless architecture, there are something called Backend as a Service (BaaS) and Function as a Service (FaaS) models.
BaaS hosts and replaces a single component as a whole (e.g Firebase authentication service, AWS Amplify).
FaaS is a type of service in which all features of application are deployed into individual single feature and then each feature is individually hosted by the provider (e.g AWS Lambda)
It’s not like one approach is better than the other. It’s just two different approaches for serverless architecture. AWS Lambda is a typical example of FaaS based architecture, as we have bare minimal level of a function that gets invoked based on certain predetermined events, and gets torn down as soon as they are done being processed.
How does Lambda work under the hood? A Lambda function runs inside a microVM (micro virtual machine). When an invocation is received, Lambda will launch a new microVM and load the code package in memory to serve the request. The time taken by this process is called startup time. MicroVMs are a new way of looking at virtual machines. Rather than being general purpose and providing all the potential functionality a guest operating system may require, microVMs seek to address the problems of performance and resource efficiency by specializing for specific use cases. By implementing a minimal set of features and emulated devices, microVM hypervisors can be extremely fast with low overhead. Boot times can be measured in milliseconds (as opposed to minutes for traditional virtual machines).
How does AWS Lambda connect with API Gatway?
AWS Lambda’s functions are event driven, and API Gateway exposes REST API endpoint online which acts as a trigger to scale AWS Lambda function up and down.
2.1 - Simple example with Lambda and API Gateway
We now know have all the preliminary knowledge to code something up. Let’s go to AWS console and make a Lambda function and corresponding API Gateway hook.
In the previous post, I created a simple service that summarizes news contents using ML model, Streamlit and Docker Compose. Well there must be ways to improve the application, as that application was mostly built on top of what I already knew about Docker. What are the next step for improving the application? Well the direction is quite clear, as there is an evident limitations with the current approach of using Docker Compose. It works for testing locally on my computer, but it’s not scalable in the production, as it’s meant to run on a single host. To summarize:
You can implement health checks, but it won’t recreate containers when it fail (absence of self-healing)
Absence of proper load balancer
Docker Compose is designed to run on a single host or cluster, while Kubernetes is more agile in incorporating multiple cloud environments and clusters.
Kubernetes is easier to scale beyond a certain point plus you can utilize native services of AWS ,Azure and GCP to support our deployment.
Docker Swarm and Kubernetes are two possible options, but for scalability and compatibility, Kubernetes should be pursued.
Understanding Kubernetes
Kubernetes archietecture can be simplified to below image:
Containers: Package everything in Containers so that Kubernetes can run it as a service.
Node: It is a representation of a single machine in your cluster. In most production systems, a node will likely be either a physical machine in a datacenter, or virtual machine hosted on a cloud provider like GCP and AWS.
Cluster: A grouping of multiple nodes in a Kubernetes environment. Kubernetes runs orchestration on clusters to control/scale nodes and pods.
Pods: Kubernetes doesn’t run containers directly; instead it wraps one or more containers into a higher-level structure called a pod. Any containers in the same pod will share the same resources and local network. Containers can easily communicate with other containers in the same pod as though they were on the same machine while maintaining a degree of isolation from others. Because pods are scaled up and down as a unit, all containers in a pod must scale together, regardless of their individual needs. This leads to wasted resources and an expensive bill. To resolve this, pods should remain as small as possible, typically holding only a main process and its tightly-coupled helper containers
Understanding workloads
You have your applications running from Dockerfiles. In Kubernetes, these applications are referred as workload. Whether your workload is a single component or several that work together, on Kubernetes you run it inside a set of pods.
You can define the behaviours of individual pods, but often it’s better to define them as deployment. The Deployment creates a ReplicaSet that creates three replicated Pods, indicated by the .spec.replicas field. You can choose to declare ReplicaSet directly, but Deployment is higher-level concept, and is often more preferred way to define things.
Jobs
There are many kinds of workloads besides Deployment. Another useful type would be Jobs. A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. You can also use a Job to run multiple Pods in parallel. By default, each pod failure is counted towards the .spec.backoffLimit.
Code above will run a program that counts in parallel, with failure limit up to 4. .spec.parallelism controls parallel computation features.
CronJob
Cronjob is another useful workload type, performing regular scheduled actions such as backups, report generation, and so on. You can use these to perform ETL, and choose to make it more robust with tools like Airflow
apiVersion:batch/v1kind:CronJobmetadata:name:hellospec:schedule:"*****"# refer to the website for controlling the schedule parameters.jobTemplate:spec:template:spec:containers:-name:helloimage:busybox:1.28imagePullPolicy:IfNotPresentcommand:-/bin/sh--c-date; echo Hello from the Kubernetes clusterrestartPolicy:OnFailure
When using a Kubernetes service, each pod is assigned an IP address. As this address may not be directly knowable, the Service API lets you expose an application running in Pods to be reachable from outside your cluster, so it’s an abstraction to help you expose groups of Pods over a network. Each Service object defines a logical set of endpoints (usually these endpoints are Pods) along with a policy about how to make those pods accessible.
Ingress is not actually a type of service. Instead, it is an entry point that sits in front of multiple services in the cluster. It can be defined as a collection of routing rules that govern how external users access services running inside a Kubernetes cluster. Ingress is most useful if you want to expose multiple services under the same IP address. Powerful way to handle things, but you need to set up Ingress Controllers (multiple different providers, can be difficult).
The way you define service is similar to workloads.
You define the communication protocol, and port range that it would be using to communicate.
And when defining manifest file, You can, and you should bind resources and services together:
apiVersion:v1kind:Podmetadata:name:nginxlabels:app.kubernetes.io/name:proxyspec:containers:-name:nginximage:nginx:stableports:-containerPort:80name:http-web-svc# define the name ---apiVersion:v1kind:Servicemetadata:name:nginx-servicespec:selector:app.kubernetes.io/name:proxyports:-name:name-of-service-portprotocol:TCPport:80targetPort:http-web-svc# target is the pod defined above.
A Service also provides load balancing. Clients call a single, stable IP address, and their requests are balanced across the Pods that are members of the Service. There are five general types of service for loadbalancing.
ClusterIP
NodePort
LoadBalancer
ExternalName
Headless
ClusterIP
Internal clients send requests to a stable internal IP address. This is often the default option.
Kubernetes will assign a cluster-internal IP address to ClusterIP service. This makes the service only reachable within the cluster. ClusterIP is most typically used for Inter-service communication within the cluster. For example, communication between the front-end and back-end components of your app.
Nodeport
NodePort service is the most primitive way to get external traffic directly to your service. NodePort, as the name implies, opens a specific port on all the Nodes (the VMs), and any traffic that is sent to this port is forwarded to the service. There are limitations like — only one service per port, but this gives a lot of room for customization.
LoadBalancer
LoadBalancer Clients send requests to the IP address of a network load balancer, and load balancer decides how to direct the clients to which services. LoadBalancer service is an extension of NodePort service, and each cloud provider (AWS, Azure, GCP, etc) has its own native load balancer implementation. When you are using a cloud provider to host your Kubernetes cluster, you would be definitely using LoadBalancer. There are many different types of LoadBalancers, and we will be learning about these in the future.
Building my own examples
Well, that is enough basic knowledge to get started. Now is the time to get hands dirty with our own examples. In the previous two series of the posts, I made an examples using NLP models. But this isn’t really related to my profession, as I work in the field of Computer Vision.
So in this example, I am going to:
Train a small image classifier that can run Inference on CPU
Serve the models using FastAPI, just as how I served the NLP news models.
Scale up the workloads using Kubernetes
Use Minikube instead of deploying on the cloud for testing first.
Later the same architecture will be hosted in the cloud (AWS)
What is Minikube?
Minikube is most typically used for testing Kubernetes. A Kubernetes cluster can be deployed on either physical or virtual machines (Nodes). To get started with Kubernetes development, you can use Minikube. Minikube is a lightweight Kubernetes implementation that creates a VM on your local machine and deploys a simple cluster containing only one node. Minikube is available for Linux, macOS, and Windows systems. The Minikube CLI provides basic bootstrapping operations for working with your cluster, including start, stop, status, and delete.
It is important to be able to monitor the containers. Containerized apps are more dynamic–they may run across dozens or hundreds of containers that are shortlived and are created or removed by the container platform. You need a monitoring approach that is container-aware, with tools that can plug into the container platform for discovery and find all the running applications without a static list of container IP addresses. Unfortunately there isn’t one silverbullet framework that can handle all the monitoring by itself (at least those that are open source), but by combining multipe different frameworks, you can create very solid monitoring environment. Monitoring becomes increasingly important when you try later to scale the app using Kubernetes.
Prometheus is an open-source systems monitoring and alerting toolkit, which itself is a monitoring tool. Prometheus is a self-hosted set of tools which collectively provide metrics storage, aggregation, visualization and alerting. Think about it as time series database that you can query.
Grafana While Prometheus provides a well-rounded monitoring solution, we need dashboards to visualize what’s going on within our cluster. That’s where Grafana comes in. Grafana is a fantastic tool that can create outstanding visualizations. Grafana itself can’t store data, but you can hook it up to various sources to pull metrics from it, including Prometheus.
Prometheus has got first-class support for alerting using AlertManager. With AlertManager, you can send notifications via Slack, email, PagerDuty, and tons of other mediums when certain triggers go off.
Node exporter, which as the name suggests will be responsible for exporting hardware and OS metrics exposed by our Linux host. It enables you to measure various machine resources such as memory, disk and CPU utilization. The Node Exporter is a project that is maintained through the Prometheus project. This is a completely optional step and can be skipped if you do not wish to gather system metrics. Think of this as a module to gather more data. The node_exporter is designed to monitor the host system.
Node Exporter is for exporting local system metrics. For Docker (and Kubernetes), you will need cAdvisor. Essentially does the same thing as Node Exporter, but scrapes data for individual containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers. Specifically, for each container it keeps resource isolation parameters, historical resource usage, histograms of complete historical resource usage and network statistics. This data is exported by container and machine-wide.
Changing Docker Compose for Monitoring
Now that we know what each components do, it is time to make the corresponding changes. The code can be found here. Note that quite a few files that needs to be changed, and it takes sometime to digest what is going on. First of all, let’s see the whole Docker Compose script. You will realize that it became much longer, so let’s break it down components by components.
# docker-compose.yml
version: '3.8'
networks:
monitoring:
driver: bridge
# Persist data from Prometheus and Grafana with Docker volumes
volumes:
prometheus_data: {}
grafana_data: {}
x-logging:
&default-logging
driver: "json-file"
options:
max-size: "1m"
max-file: "1"
tag: ""
First of all, a bridge network is a Link Layer device which forwards traffic between network segments. A bridge can be a hardware device or a software device running within a host machine’s kernel. Docker Compose understands the idea behind running services for one application on one network. When you deploy an app using Docker Compose file, even when there’s no mention of specific networking parameters, Docker Compose will create a new bridge network and deploy the container over that network. Because they all belong to the same network, all our modules can easily talk to each other. There are other network modes like overlay used for Swarm, but we are mainly interested in bridge. Here, a persistent volume for both Prometheus and Grafana is defined, and also some logging components are initialized.
Configuring Grafana
Next up, we define our first service, Grafana, the visualization dashboard.
For Grafana, I am defining some config files for dashboard and datasources, which will be used by Grafana to display contents. These, however, do not have to be populated prior to provisioning. They can be generated after dashboard being created, which will be discussed later more in detail. And obviously the data being rendered by Grafana is all from Prometheus, so it would naturally depend on it. The convention is to use port 3000 for Grafana.
Configuring Prometheus
Next up is the section for Prometheus. This is probably the most important component, as it scraps data from various components.
The most important part of this section is that I am referring to a prometheus.yml file, and this gets automatically inserted into the volume of the container. This file sets rules for where Prometheus should be looking for the data.
You can feed in IPs, or name of the component as targets.
Configuring CAdvisor
Next up is CAdvisor and Redis service. CAdvisor requires Redis server to be initialized. CAdvisor will gather container metrics from this container automatically, i.e. without any further configuration.
Okay, finally all the components have been defined. There are quite a bit of details for each components, and understanding everything thats happening behind the scene is quite difficult. But I understand the basics now (purpose of each parts, and how they should be configured), so that’s enough to get going for now. Build with docker compose, and go to localhost:3000. At the beginning your password and id should be both admin. Add sources, and based on the source, you can create any dashboard. You can look for various dashboard options here.
One thing to note is that source URL pointing at Prometheus (which scraps all the data) must be http://prometheus:9090. I tried directing to http://localhost:9090 and was getting connection error, and it took a while to figure that out.
Add Backend data to Grafana
Awesome, I see some metrics being generated on Grafana, but where are metrics regarding the backend? I did ask Prometheus to scrap data from backend, but they are missing on Grafana. Well, they are missing because I never asked backend to generate metrics in the format that Prometheus wants .
Prometheus can collect multiple types of data, and you can find information about them here. The Prometheus client libraries offer four core metric types:
Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.
Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. Gauges are typically used for measured values like temperatures or current memory usage, but also “counts” that can go up and down, like the number of concurrent requests.
Histogram: A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.
Summary: Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.
Normally, in order to have your code generate above for metrics, you would need to use client libraries provided by Prometheus. There is a General Client library for Python as well. However, it is much easier to use a higher level library speficially designed for FastAPI, like Prometheus FastAPI Instrumentator. Update your requirements.txt to add the instrumentator. Now you need to chnage the backend code like following:
"""Simple scrapper that randomly scraps a website and gets sentences from it."""fromfastapiimportFastAPIfromscrapeimportget_random_newsfromsummarizeimportsummarize_random_news,Datafromprometheus_fastapi_instrumentatorimportInstrumentatorfromfastapi.middleware.corsimportCORSMiddlewareapp=FastAPI()app.add_middleware(CORSMiddleware,allow_origins=["*"],allow_credentials=True,allow_methods=["*"],allow_headers=["*"],)@app.get("/api/v1/get/news",status_code=200)asyncdefget_news():"""Simple get function that randomly fetches a news content."""returnget_random_news()@app.get("/api/v1/get/summary",status_code=200)asyncdefget_summary(data:Data):"""Simple get function that randomly fetches a news content."""returnsummarize_random_news(data)Instrumentator().instrument(app).expose(app)
In order to use Intrumentator, you need to add middleware to FastAPI applications. A middleware is a function that works with every request before it is processed by any specific path operation. And also with every response before returning it. CORS inside CORSMiddleware stands for Cross-Origin Resource Sharing. It allows us to relax the security applied to an API. This is enough to get basics. You can import this metrics.json file to view stats related to backend.
If you are software engineer with some experience, it’s very likely that you would already have some experience with Docker. Distributed systems and containers have taken over the world, and essentially every applications out there is containerized these days. Naturally, there are immense amount of contents associated with Docker beyond simple image build, pull, etc, and a lot of the sophisticated tools like Kubernetes, Terraform, Ansible, and etc operate on top of containers. I have used Docker here and there, and I thought it would be good to check how much I know about Docker and maybe fill up some gaps.
Docker is open source platform that offers container services. Containers are lightweight Virtual Machines (VM), and they leverage features of the host operating system to isolate processes and control the processes’ access to CPUs, memory and desk space. Unlike VMs that uses hypervisor that virtualizes physical hardware, containers virtualize the operating system (typically Linux or Windows) so each individual container contains only the application and its libraries and dependencies, allowing the applications to be ran on any operating system without conflicts.And typically applications would have more than one containers running. Application would consist multiple instances of containers (i.e Microservices) that are in charge of different aspects of the app, hosted in the virtual cloud like Elastic Container Registry (ECR), talking to each other over virtual network without being explosed to the internet.
Okay now we reviewed what Docker is, and why we are using it. Let’s do the very basics according to the outline of the book. Assuming you have Docker installed in your environment already, let’s dive straight to it. I won’t even bother pulling pre-existing images and doing interactions with them, because I assume that we all already know how to do that.
#pull the docker images hosted on dockerhub already.$ docker image pull diamol/ch03-web-ping
#run the ping application$ docker container run -d--name web-ping diamol/ch03-web-ping
#check the logs collected by Docker$ docker container logs web-ping
# tear down the container$ docker rm-f web-ping
# run a version that pings to specific target, google in this case. $ docker container run --envTARGET=google.com diamol/ch03-web-ping
Now, to write a Dockerfile that does above, we obviously need to write a Dockerfile. We can simply create one mirrowing the Image we have pulled from DockerHub. Refer to the Dockerfile.
$ docker image build --tagtest.$ docker container run --envTARGET=google.com test
Very easy. Next up, you have multi-stage Dockerfile that uses Java Application.
Working with multi-stage Dockerfile
Each stage starts with a FROM instruction, and you can optionally give stages a name with the AS parameter. In this example that uses JAVA, Maven is a build automation tool used primarily for Java projects. OpenJDK is a freely distributable Java runtime and developer kit. Maven uses an XML format to describe the build, and the Maven command line is called mvn. We can package everything together like this run make process, such as compiling and running applications in one go.
FROM diamol/maven AS builder
WORKDIR /usr/src/iotd
COPY pom.xml .
RUN mvn -B dependency:go-offline
COPY ..
RUN mvn package
# app
FROM diamol/openjdk
WORKDIR /app
COPY --from=builder /usr/src/iotd/target/iotd-service-0.1.0.jar .
EXPOSE 80
ENTRYPOINT ["java", "-jar", "/app/iotd-service-0.1.0.jar"]
The builder stage starts by creating a working directory in the image and then copying in the pom.xml file, which is the Maven definition of the Java build.
The first RUN statement executes a Maven command, fetching all the application dependencies. This is an expensive operation, so it has its own step to
Make use of Docker layer caching. If there are new dependencies, the XML file will change and the step will run. If the dependencies haven’t changed, the layer cache is used.
Next the rest of the source code is copied in—COPY . . means “copy all files and directories from the location where the Docker build is running, into the working directory in the image.”
mvn package compiles and packages the application. This creates Jar file.
The app now copies compiled JAR file from the builder.
App exposes port 80.
ENTRYPOINT is alternative to CMD operation, executing compiled JAVA file.
Distributed Application: My Python Example - News Summarizer
Most applications don’t run in one single component. Even large old apps are typically built as frontend and backend components, which are separate logical layers running in physically distributed components. Docker is ideally suited to running distributed applications—from n-tier monoliths to modern microservices. Each component runs in its own lightweight container, and Docker plugs them together using standard network protocols. This involves defining Docker Compose.
One thing that I absolutely hate doing, is blindly following the textbook and running codes that someone else has written for you. This does not help learning in any ways, especially since these codes are not written in Python (Java, Javascript, etc). Best thing to do is defining your own examples. This is a very simple application, as it only has two parts to it, backend, and frontend. The codes can be found in this repository. Here is the general outline:
The backend is written using FastAPI, as I have already gotten quite familar with it while reviewing RESTful Application structure There are two parts to the backend of our application:
Webscrapper that randomly scraps news article and content from daily mail
HuggingFace Machine learning model that summarizes news contents.
FrontEnd
And finally a Lite-frontend module generated with Streamlit. Streamlit turns simple scripts into shareable web apps literally in minutes. Of course there are certain limitations so it cannot compete against full-stack frameworks like Django, or more conventional JS frameworks like Vue and React, but for fast application development to test ideas, Streamlit is a cleaner winner.
The overall idea is to have a very simple frontend page that loads and shows news article on button click (which calls webscrapper backend), and another button that turns loaded news article into an input for ML summarization model, loading the summary on the frontend when the inference finishes.
News Summarizer backend code
Let’s look into the details of the backend. The very first logic is loading the news article. Python has a html parsing library called BeautifulSoup, and all I am doing is reading dailymail news page, and randomly selecting one news article to generate an input.
importrequestsfrombs4importBeautifulSoupimportnumpyasnpimportreimportrandomdefget_random_news():url="https://www.dailymail.co.uk"r1=requests.get(url)coverpage=r1.contentsoup1=BeautifulSoup(coverpage,"html5lib")coverpage_news=soup1.find_all("h2",class_="linkro-darkred")list_links=[]# choose a random number
n=random.randint(0,len(coverpage_news))final_data={}list_links=[]# Getting the link of the article
link=url+coverpage_news[n].find("a")["href"]list_links.append(link)# Getting the title
title=coverpage_news[n].find("a").get_text()# Reading the content (it is divided in paragraphs)
article=requests.get(link)article_content=article.contentsoup_article=BeautifulSoup(article_content,"html5lib")body=soup_article.find_all("p",class_="mol-para-with-font")# Unifying the paragraphs
list_paragraphs=[]forpinnp.arange(0,len(body)):paragraph=body[p].get_text()list_paragraphs.append(paragraph)final_article=" ".join(list_paragraphs)# Removing special characters
final_article=re.sub("\\xa0","",final_article)final_data["title"]=title.strip()final_data["content"]=final_articlereturnfinal_data
Now that the input is ready, I need a ML model that can run inference and generate summaries. The core ML model is just generic Summarization NLP model from Hugging Face. I did not bother retraining or optimizing the model, as the purpose of this example is to practice Docker, not to create state-of-the-art model. FastAPI uses Pydantic for data validation, so I created a very basic Data class here as well.
fromtransformersimportpipelineimportwarningsfrompydanticimportBaseModelwarnings.filterwarnings("ignore")classData(BaseModel):"""Data model for the summary. Keep it simple."""content:strdefsummarize_random_news(content:Data)->dict:"""Summarizes a random news."""summarizer=pipeline("summarization",model="stevhliu/my_awesome_billsum_model")summarized_content=summarizer(content.content)actual_summary=summarized_content[0]["summary_text"]return{"summary":actual_summary}if__name__=="__main__":summary=summarize_random_news()print(summary)
It is possible to put these two functions into two different services, but to keep things simple, I have decided to group them together under single backend service. And of course I need the FastAPI main script to provide endpoint for above two functions. I do not have any DB setup at the moment, so there are no POST, DELETE or PUT. There are only simple GET function at the moment for each of the function. There are no parameters for get_news as it will just generate random news input. This input is passed to get_summary, and Pydantic will validate that it is proper string.
"""Simple scrapper that randomly scraps a website and gets sentences from it."""fromfastapiimportFastAPIfromscrapeimportget_random_newsfromsummarizeimportsummarize_random_news,Dataapp=FastAPI()@app.get("/api/v1/get/news",status_code=200)asyncdefget_news():"""Simple get function that randomly fetches a news content."""returnget_random_news()@app.get("/api/v1/get/summary",status_code=200)asyncdefget_summary(data:Data):"""Simple get function that randomly fetches a news content."""returnsummarize_random_news(data)
News Summarizer frontend code
Everything in the front is defined with Streamlit. Here, I need to manage the state of the news data and change them with the buttons.
importstreamlitasstimportpandasaspdimportrequestsimportos# init sessionv variables
if"news"notinst.session_state:st.session_state.news="Click the button to get news"if"summary"notinst.session_state:st.session_state.summary="Click the button to get summary"
Then I need equivalent frontend functions to trigger the backend function. Here, hard coding the backend endpoint is a very bad practice. What if the URL changes? then you need to change every single one of them. So sensible way to do it is to declare them inside building Docker Compose, and use things like Enviornment variables.
defdisplay_news():"""Called upon onclick of get_news_button"""if"BACKEND_PORT"notinos.environ:backend_port=5000else:backend_port=str(os.getenv("BACKEND_PORT"))# get the news from backend
response=requests.get(f"http://backend:{backend_port}/api/v1/get/news")# response successful return the data
ifresponse.status_code==200:content=response.json()text=content["content"]title=content["title"]returnf"{title}\n\n{text}"else:return"Port set properly, but backend did not respond with 200"defget_summary():"""Called upon onclick of summarize_button"""ifst.session_state.news=="Click the button to get news":return"Get news first, and then ask to summarize"if"BACKEND_PORT"notinos.environ:backend_port=5000else:backend_port=str(os.getenv("BACKEND_PORT"))# get the news from backend using json obj (based on Pydantic definition).
json_obj={"content":st.session_state.news}response=requests.get(f"http://backend:{backend_port}/api/v1/get/summary",json=json_obj)ifresponse.status_code==200:content=response.json()summary=content["summary"]returnsummaryelse:return"Port set properly, but backend did not respond with 200"
Last part is the actual front-end portion of the page that can be literally defined in few lines. Creating buttons, lines, etc is very easy. The buttons will trigger state change, and the bottom text section will display whatever the current state of the session variable is set as.
st.title("Summarize News using AI 🤖")st.markdown("Microservice test application using FastAPI (backend), HuggingFace (inference), and Streamlit(frontend). Using Docker Compose to turn into microservice. To get a news article, hit the Get News Article. To summarize the article, hit the Summarize button.",unsafe_allow_html=False,help=None,)col1,col2,col3,col4,col5,col6,col7,col8=st.columns(8)col1,col2,col3,col4=st.columns(4)withcol1:passwithcol2:get_news_button=st.button("Get News")withcol3:summarize_buttom=st.button("Summarize")withcol4:passifget_news_button:st.session_state.news=display_news()ifsummarize_buttom:st.session_state.summary=get_summary()st.header("News Article",help=None)st.markdown(st.session_state.news)st.divider()st.subheader("Summary")st.markdown(st.session_state.summary)
News Summarizer Dockerfiles
And of course there is the docker compose file and dockerfiles that glues everything together, and actually run things in harmony. First of all, Docker Compose file that wraps things together. Here, I am providing ENV file so that I do not have to hardcode the port number for FRONT and BACK. In terms of functionality, defining COMMAND in Docker Compose and defining it inside Dockerfile, shows no difference. So I am defining backend command to start a backend server in the Docker Compose. Health checks are implemented to periodically check the status of the containers, note, there is limitations in terms of self-healing mechanism, which will be addressed in the next post.
You can see from above that I am defining the backend network alias as backend, so that I do not have to hardcode any IP addresses for the frontend to receive the data from the backend. The frontend also depends on the backend. Use depends_on to express the startup and shutdown dependencies b/w services. Frontend will always depend on other services being ready. For scaling, you need to make sure that you are not binding yourself to a specific port. Otherwise you will get message saying your port has already been allocated when you try to scale your application like below:
docker compose up -d--scalebackend=3
If your ports are set up properly, you can create multiple instances of the backend like below. Again, there are limitations regarding load-balancing regarding this approach, which will be addressed in the next post.
Front Dockerfile
And of course, you need the actual Docker files for frontend and backend. I am defining frontend start command inside the Dockerfile, but this could be defined inside Docker Compose file as well.
# reduced python as base image
FROM python:3.8-slim-buster
# set a directory for the app
WORKDIR /usr/src/app
# copy all the files to the container
COPY ..# pip install dependencies
RUN pip install--no-cache-dir-r requirements.txt
# RUN apt-get -y update; apt-get -y install curl# expose port in the environment.
EXPOSE ${FRONTEND_PORT}
ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]
Backend Dockerfile
Backend is even more simple. All that is happening here is copying the files over to the container, downloading requirement files. The backend needs Pytorch, so you can technically use the Pytorch containers as well.
FROM python:3.9
# FROM nvcr.io/nvidia/pytorch:23.04-py3
WORKDIR /app
#copy to cache for faster run
COPY ./requirements.txt /code/requirements.txt
RUN pip install--no-cache-dir--upgrade-r /code/requirements.txt
# copy the everything over to container
COPY ./ /app/
Final Result
According to the front design, two buttons generate input and output. We can confirm that everything works as it should:
And the summary is generated perfectly as well.
Awesome, in the next post, I will try to improve this simple summary app.
With the background in Machine Learning, the natural choice of programming language for me was Python, which was (and still is) overwhelming popular, and the most dominant language in the field of Machine Learning. I have already spent years programming in Python, and I really love the Zen of Python. Like any other programming languages, however, Python is not perfect. Python is innately slower than modern compiled languages like Go, Rust, and C/C++, because it sacrified speed for easier usage and more readable code. But why is it slow exactly?
Common characteristics of Python are:
Interpreted language
Dynamically typed
GC Enabled Language
If you ever studied languages like C++, you know of course that compiled languages have characteristics exactly opposite of above. C compilers like GNU will convert the source code into Machine Code, and generates executables that gets directly ran by CPU. Python, on the other hand, does not provide this compile step. The source code gets executed directly, at which Python will compile the source code into bytecode that can be read by Interpreters, where bytecode instructions are executed on a Virtual Machine, line by line. It can only be executed line by line because the language is dynamically typed. Because we don’t specify the type of a variable beforehand, we have to wait for the actual value in order to determine whether what we’re trying to do is actually legal. These errors cannot be caught during compile time, because the validity can only be checked by the Interpreter.
When learning C/C++, something that always makes you bang your head on the wall is the memory management–Notorious Segmentation Fault Error. Python has automatic garbage collecting, so you never have to worry about things like pointers, but it’s definitely not the most efficient way to handle memory. You can never shoot yourself in the foot, but it slows you down at the same time.
Global Interpreter Lock (GIL)
Another factor that slows Python down is Global Interpreter Lock (GIL). By design, Python operates on single-thread, on a single CPU. Since the GIL allows only one thread to execute at a time even in a multi-threaded architecture with more than one CPU core, you may wonder why in the world that Python Developers created this feature.
GIL solves a race-condition where multiple threads try to access and modify a single point of memory, which causes memory leak and also sorts of bugs that Python cannot handle. This was practical solution as Python manages memory for you, and earlier Python did not even have the concept of threading.
Languages like Rust guarantees thread safety (no multiple threads accessing same variables at the same time) using similar concepts for memory safety and provides standard library features like channels, mutex, and ARC (Atomically Reference Counted) pointers. Python’s GIL does not gaurantee thread-safety, obviously because it’s making one thread take all the resources. So is there concept of multi-threading in Python? yes of course. This will be discussed in the next section.
Multiprocessing and Threading
In the previous few sections I have discussed the limitations of Python. Some of these problems are simply impossible to overcome due to the nature of Python, but Python has made continuous improvements in many areas and thus it stands where it is right now. Many popular Python packages (and pretty much all ML libraries) are written in C to offer competitive speed, and now Python supports libraries for multi-processing and threading, which gives additional boost to the speed. This blog post will introduce various aspects of multi-processing and multi-threading.
Many people misleadingly use multiprocessing and threading interchangeably, whereas there are not actually the same.
Above illustration is a very nice overview of the difference b/w threading and multiprocessing.
First of all, threading is a great way to speed up application to tackle I/O bound problems, where threads within a single process run parallely. For example, say you have a multi-stage pipeline in your application where user interacts with the server. If there is a significant bottleneck in one of the stages, and your process is only running sequentially, it will take much longer time to finish all the process since the bottleneck process needs to be completed before moving on to the next stage. However, if the process is running concurrently, the entire application should not take much longer than the time required to execute the bottleneck process.
This is why we have Multi-processors. When the program is CPU intensive and doesn’t have to do any IO or user interaction, you can use multiprocessors to exponentially speed up the computation by breaking down the problem into chunks, and delegating each cores to handle the chunks. More processes can be created if your computer has more cores. It becomes less scalable when your spin up more processes than the number of cores in your computer.
Python Multiprocessing
Now that we have covered the basics, it’s time to dive in. In this post, I will cover multiprocessing library, which is the core Python modules for multiprocessing. I have already used multiprocessing to some extent for work (for database migration), so it would be easier to start from something that I am familiar with. Multiprocessing library offers four major parts:
Process
Locks
Queues
Pools
Process
Process is the most basic unit of execution in Python. Every process has its copy of the Python interpreter and its memory space, allowing multiple jobs to get executed simultaneously without conflicts. There are two types of processes:
Parent Process: The main process that can start another child process.
Child process: Process created by another process. Also called subprocess. It can only have one parent process, like how humans can only have one biological paraents. A child process may become orphaned if the parent process gets terminated.
The multiprocessing.Process instance then provides a Python-based reference to this underlying native process. Let’s try creating two processes that starts a function that sleeps.
importtimeimportmultiprocessingasmpdefsleep(sleep_sec=1):process=mp.current_process()pid=process.pidprint(f'process {pid}: Sleeping for {sleep_sec} seconds \n')time.sleep(sleep_sec)print(f'process {pid} Finished sleeping \n')# create two process
start_time=time.perf_counter()p1=mp.Process(target=sleep)p2=mp.Process(target=sleep)# Starts both processes
p1.start()p2.start()# start both process
finish_time=time.perf_counter()# This must be called from the parent process
ifmp.current_process().name=='MainProcess':print(f"Program finished in {(finish_time-start_time):.3f} seconds")
But if you look at the output, you will notice that it is a bit funny. One would expect print(sleep) and print(finish) to execute first before the final “Program finished” call to be made on the Parent Process. But instead, you get:
Yes, it’s because the processes are executed without joining, so the main process that only takes 0.02 seconds to execute finished before waiting for p1 and p2 to finish it’s parts. So if you fix this up,
importtimeimportmultiprocessingasmp# create two process
start_time=time.perf_counter()# save all the process in a list
processes=[]#create 10 processes that all sleeps
foriinrange(3):p=mp.Process(target=sleep)p.start()processes.append(p)#join all the process
forpinprocesses:p.join()finish_time=time.perf_counter()# This must be called from the parent process
ifmp.current_process().name=='MainProcess':print(f"Program finished in {(finish_time-start_time):.3f} seconds")
Okay, now we understand how Process are started and joined. But one thing to note, is that there are actually 3 different methods for start.
Spawn: Start a new Python Process
fork: Copy a Python process from existing process
forkserver : A new process from which future forked processes will be copied.
You can set specific start method like
if__name__=='__main__':# set the start method
multiprocessing.set_start_method('spawn')
This isn’t too important for now, so further details are not going to be discussed.
Locks
Next up is the Lock. We learned that process is encapsulated Python program. These processes often share data or resources, and mutual exlcusion lock (Mutex) protects shared resources and prevents race conditions, as race conditions can easily corrupt data and create unwated results. Lock ensures that data is consistent b/w jobs, by preventing another process from accessing the data until the its released.
Let’s look into multiprocessing.Lock class. There are two states: Locked and Unlocked
# create a lock
lock=multiprocessing.Lock()# acquire the lock
lock.acquire()# release the lock
lock.release()
Lock is more useuful with the queues. So let’s discuss queues now.
Queues
This is the same queue in the context of data structure. But here we are specifically referring to a first-in, first out FIFO queue, in the context of multiprocessing. Data can be placed to a queue, and processed by another processor when it becomes available, allowing us to break up tasks into smaller parts that can be processed simultaneously.
Neural networks have been around for very long time, with multi-layer perceptrons (MLP) being introduced by Marvin Minsky and Seymour Papert in 1969, and the gradient descent based backpropagation being published even before that. The potential of neural network, however, has only been truly realized in the past decade, with the astonishing advancement of computers and the datasets associated with it. In terms of 2D image recognition, the famous ImageNet dataset was released in 2006, and the major breakthrough happened in 2012, when a convolutional neural network (CNN) designed by Alex Krizhevsky, published with Ilya Sutskever and Krizhevsky’s PhD advisor Hinton, halved the existing error rate on Imagenet visual recognition to 15.3 percent. The CNN was dubbed “AlexNet” in this legendary paper.
The concept of ConvNets were first introduced in 1980s, but AlexNet was the first official discovery that deeply stacked ConvNets could be so effective in the image related tasks. It’s been 10 years since the AlexNet paper was first released, and nowadays trasformer models are being adopted to vision related tasks too, yet CNN still remains as the mainstream due to its incredible efficiency.
Understanding CNN
When coding neural network, we use CNN blocks all the time. But it is extremely important to understand the mathemetics and the reason why they really work in practice. \[\left(\int\mathrm{\bf:}\ g\right)\bigl({\cal t}\bigr)\longrightarrow\int_{-\infty}^{\infty}\int\!\!\left(T\right)g\bigl({\cal t}\longrightarrow T\bigr)d\tau\]