_  __     _                          _            
          | |/ /   _| |__   ___ _ __ _ __   ___| |_ ___  ___ 
          | ' / | | | '_ \ / _ \ '__| '_ \ / _ \ __/ _ \/ __|
          | . \ |_| | |_) |  __/ |  | | | |  __/ ||  __/\__ \
          |_|\_\__,_|_.__/ \___|_|  |_| |_|\___|\__\___||___/
                                                             

Introduction

Functions

  1. Automatic Deployment
  2. Load Distribution
  3. Auto-Scaling
  4. Monitoring & Health Check
  5. Replacement of Failed Containers

Architecture

  1. Pod: Smallest Unit in the Kubernetes World
Anatomy of a Pod
           ┌─────────────────────────────────────────────────────┐
           │                                                     │
           │                 POD                                 │
           │                                                     │
           │   ┌──────────────┐          ┌─────────────┐         │
           │   │              │          │             │         │
           │   │  CONTAINER0  │          │  CONTAINER2 │         │
           │   └──────────────┘          └─────────────┘         │
           │                                                     │
           │   ┌──────────────┐                                  │
           │   │              │                                  │
           │   │  CONTAINER1  │              .....               │
           │   │              │                                  │
           │   └──────────────┘                                  │
           │                                                     │
           │   ┌──────────────────────────┐                      │
           │   │  Shared Resource         │                      │
           │   │  * IP                    │                      │
           │   │  * Volumes               │                      │
           │   └──────────────────────────┘                      │
           └─────────────────────────────────────────────────────┘
          
Anatomy of Kubernetes Cluster
       ┌────────────────────────────────────────────────────────────────────────────┐
       │                                                                            │
       │                             Kubernetes Cluster                             │
       │                                                                            │
       │   ┌────────────────────────────────────────┐        ┌─────────────────┐    │
       │   │             Node0                      │        │      Node1      │    │
       │   │                                        │        └─────────────────┘    │
       │   │                                        │                               │
       │   │ ┌───────────────────────────────────┐  │                               │
       │   │ │           Pod0                    │  │        ┌─────────────────┐    │
       │   │ │                                   │  │        │      Node2      │    │
       │   │ │                                   │  │        └─────────────────┘    │
       │   │ │ ┌────────────┐  ┌────────────┐    │  │                               │
       │   │ │ │  Container0│  │ Container1 │    │  │                               │        
       │   │ │ └────────────┘  └────────────┘    │  │                               │
       │   │ └───────────────────────────────────┘  │                               │
       │   │              ....                      │               .....           │
       │   │                                        │                               │
       │   │   ┌────────────┐  ┌────────────┐       │                               │
       │   │   │  Pod0      │  │   Pod1     │       │                               │
       │   │   └────────────┘  └────────────┘       │        ┌─────────────────┐    │
       │   │                                        │        │      Node3      │    │
       │   └────────────────────────────────────────┘        └─────────────────┘    │
       │                                                                            │
       │                                                                            │
       └────────────────────────────────────────────────────────────────────────────┘
                
Architecture of Nodes
     ┌──────────────────────────────────────────────────────────────────────────────┐
     │                                                                              │
     │                          Kubernetes Cluster                                  │
     │                                                                              │
     │                                                                              │
     │                            ┌─────────────────┐                               │
     │                            │                 │                               │
     │             ┌──────────────┤   Master Node   ├───────────────┐               │
     │             │              └─────┬───────────┘               │               │
     │             │                    │                           │               │
     │             │                    │                           │               │
     │             ▼                    │                           ▼               │
     │       ┌────────────────┐         │                  ┌──────────────────┐     │
     │       │                │         │                  │                  │     │
     │       │  Worker Node   │         │                  │   Worker Node    │     │
     │       └────────────────┘         │                  └──────────────────┘     │
     │                                  ▼                                           │
     │                            ┌───────────────┐                                 │
     │                            │               │                                 │
     │                            │  Worker Node  │                                 │
     │                            └───────────────┘                                 │
     │                                                                              │
     │                                                                              │
     └──────────────────────────────────────────────────────────────────────────────▼

Kubernetes Services
  ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                                                                                         │
  │ ┌──────────────────────────────────────────┐                                                            │
  │ │                                          │                                                            │
  │ │                                          │                                                            │
  │ │         Master Nodes                     │                                                            │
  │ │                                          │      ┌─────────────────────────────────────────────────┐   │
  │ │  ┌────────────────────────────────────┐  │      │                                                 │   │
  │ │  │ * Api Server         ────────────────────┐   │               Worker Node                       │   │
  │ │  │ * Scheduler                        │  │  │   │                                                 │   │
  │ │  │ * Kube controller Manager          │  │  │   │                                                 │   │
  │ │  │ * Cloud Controller Manager         │  │  │   │ ┌───────────┐                                   │   │
  │ │  │ * etcd                             │  │  ├────►│  Kubelet  │              ┌──────────────────┐ │   │
  │ │  └────────────────────────────────────┘  │  │   │ └───────────┘              │container runtime │ │   │
  │ │                                          │  │   │ ┌────────────┐             └──────────────────┘ │   │
  │ │                                          │  │   │ │  kube-proxy│                                  │   │
  │ │                                          │  │   │ └────────────┘                                  │   │
  │ │ ┌────────────┐                           │  │   └─────────────────────────────────────────────────┘   │
  │ │ │ Kubelet    │     ┌──────────────────┐  │  │                                                         │
  │ │ └────────────┘     │ Container Runtime│  │  │   ┌─────────────────────────────────────────────────┐   │
  │ │                    └──────────────────┘  │  │   │                                                 │   │
  │ │ ┌────────────┐                           │  │   │               Worker Node                       │   │
  │ │ │ Kube-proxy │                           │  │   │                                                 │   │
  │ │ └────────────┘                           │  │   │                                                 │   │
  │ │                                          │  │   │ ┌───────────┐                                   │   │
  │ └──────────────────────────────────────────┘  └────►│  Kubelet  │              ┌──────────────────┐ │   │
  │                                                   │ └───────────┘              │container runtime │ │   │
  │                                                   │ ┌────────────┐             └──────────────────┘ │   │
  │                                                   │ │  kube-proxy│                                  │   │
  │                                                   │ └────────────┘                                  │   │
  │                                                   └─────────────────────────────────────────────────┘   │
  │                                                                                                         │
  └─────────────────────────────────────┬────┬──────────────────────────────────────────────────────────────┘
                                        │    │
                                        │    │
                                        │    │
                                        ▼    ▼
                                 ┌──────────────────┐
                                 │  Managed Using   │
                                 │      kubectl     │
                                 └──────────────────┘

Kubectl Commands

  1. kubectl get nodes | pods | namespaces
  2. kubectl get pods --namespace=kube-system
  3. kubectl run nginx --image=nginx

Architecture

![[Pasted image 20230512203536.png]]


Control Plane

Control Plane Components

  1. API Server
  2. Scheduler
  3. Controller Manager
  4. KV Store (ETCD)

Can be abbreviated as SACK - Scheduler, API Server, Controller Manager and KV Store (which is ETC D)

Control Plane runs on

API Server

Using `/home/akuma/SoopaProject/Study/k8s/k8s-first-source-code/pkg/apiserver

This is basically the checkout of first commit, hash 2c4b3a562ce34cddc3f8218a2c4d11c7310e6d56 Which pretty much has some information about the design as such

// RESTStorage is a generic interface for RESTful storage services
type RESTStorage interface {
	List(*url.URL) (interface{}, error)
	Get(id string) (interface{}, error)
	Delete(id string) error
	Extract(body string) (interface{}, error)
	Create(interface{}) error
	Update(interface{}) error
}

// Status is a return value for calls that don't return other objects
type Status struct {
	success bool
}

// ApiServer is an HTTPHandler that delegates to RESTStorage objects.
// It handles URLs of the form:
// ${prefix}/${storage_key}[/${object_name}]
// Where 'prefix' is an arbitrary string, and 'storage_key' points to a RESTStorage object stored in storage.
//
// TODO: consider migrating this to go-restful which is a more full-featured version of the same thing.
type ApiServer struct {
	prefix  string
	storage map[string]RESTStorage
}

Seems like api-server is mostly performing crud with the etcd and forwarding requests to the original components

func (server *ApiServer) handleREST(parts []string, url *url.URL, req *http.Request, w http.ResponseWriter, storage RESTStorage) {
	switch req.Method {
	case "GET":
		switch len(parts) {
		case 1:
			controllers, err := storage.List(url)
			if err != nil {
				server.error(err, w)
				return
			}
			server.write(200, controllers, w)
		case 2:
			task, err := storage.Get(parts[1])
			if err != nil {
				server.error(err, w)
				return
			}
			if task == nil {
				server.notFound(req, w)
				return
			}
			server.write(200, task, w)
		default:
			server.notFound(req, w)
		}
		return
	case "POST":
		if len(parts) != 1 {
			server.notFound(req, w)
			return
		}
		body, err := server.readBody(req)
		if err != nil {
			server.error(err, w)
			return
		}
		obj, err := storage.Extract(body)
		if err != nil {
			server.error(err, w)
			return
		}
		storage.Create(obj)
		server.write(200, obj, w)
		return
	case "DELETE":
		if len(parts) != 2 {
			server.notFound(req, w)
			return
		}
		err := storage.Delete(parts[1])
		if err != nil {
			server.error(err, w)
			return
		}
		server.write(200, Status{success: true}, w)
		return
	case "PUT":
		if len(parts) != 2 {
			server.notFound(req, w)
			return
		}
		body, err := server.readBody(req)
		if err != nil {
			server.error(err, w)
		}
		obj, err := storage.Extract(body)
		if err != nil {
			server.error(err, w)
			return
		}
		err = storage.Update(obj)
		if err != nil {
			server.error(err, w)
			return
		}
		server.write(200, obj, w)
		return
	default:
		server.notFound(req, w)
	}

Scheduler

Path: /home/akuma/SoopaProject/Study/k8s/k8s-first-source-code/pkg/registry Commit: 2c4b3a562ce34cddc3f8218a2c4d11c7310e6d56

// Scheduler is an interface implemented by things that know how to schedule tasks onto machines.
type Scheduler interface {
	Schedule(Task) (string, error)
}

// RandomScheduler choses machines uniformly at random.
type RandomScheduler struct {
	machines []string
	random   rand.Rand
}

func MakeRandomScheduler(machines []string, random rand.Rand) Scheduler {
	return &RandomScheduler{
		machines: machines,
		random:   random,
	}
}

func (s *RandomScheduler) Schedule(task Task) (string, error) {
	return s.machines[s.random.Int()%len(s.machines)], nil
}

// RoundRobinScheduler chooses machines in order.
type RoundRobinScheduler struct {
	machines     []string
	currentIndex int
}

func MakeRoundRobinScheduler(machines []string) Scheduler {
	return &RoundRobinScheduler{
		machines:     machines,
		currentIndex: 0,
	}
}

func (s *RoundRobinScheduler) Schedule(task Task) (string, error) {
	result := s.machines[s.currentIndex]
	s.currentIndex = (s.currentIndex + 1) % len(s.machines) // this is neat af 
	return result, nil
}

type FirstFitScheduler struct {
	machines []string
	registry TaskRegistry
}

func MakeFirstFitScheduler(machines []string, registry TaskRegistry) Scheduler {
	return &FirstFitScheduler{
		machines: machines,
		registry: registry,
	}
}

func (s *FirstFitScheduler) containsPort(task Task, port Port) bool {
	for _, container := range task.DesiredState.Manifest.Containers {
		for _, taskPort := range container.Ports {
			if taskPort.HostPort == port.HostPort {
				return true
			}
		}
	}
	return false
}

func (s *FirstFitScheduler) Schedule(task Task) (string, error) {
	machineToTasks := map[string][]Task{}
	tasks, err := s.registry.ListTasks(nil)
	if err != nil {
		return "", err
	}
	for _, scheduledTask := range tasks {
		host := scheduledTask.CurrentState.Host
		machineToTasks[host] = append(machineToTasks[host], scheduledTask)
	}
	for _, machine := range s.machines {
		taskFits := true
		for _, scheduledTask := range machineToTasks[machine] {
			for _, container := range task.DesiredState.Manifest.Containers {
				for _, port := range container.Ports {
					if s.containsPort(scheduledTask, port) {
						taskFits = false
					}
				}
			}
		}
		if taskFits {
			return machine, nil
		}
	}
	return "", fmt.Errorf("Failed to find fit for %#v", task)
}


Controller Manager
ETCD / KV Store

The two ways of deploying etcd are:

  1. Stacked - here all the components are in one place on one server, and thats the master node
  2. External topology - here etcd is separated out in its own ha cluster

The strong consistency is achieved using raft algorithm.

Worker Nodes

Worker Nodes components:

Container Runtime

K8s requires a container runtime on the node where the pods are to be scheduled, these runtimes are required on all nodes. We can use following runtimes:

Kubelet

![[Pasted image 20230512233949.png]]

Kube Proxy

The kube-proxy is the network agent which runs on each node, control plane and workers, responsible for dynamic updates and maintenance of all networking rules on the node. It abstracts the details of Pods networking and forwards connection requests to the containers in the Pods. 

The kube-proxy is responsible for TCP, UDP, and SCTP stream forwarding or random forwarding across a set of Pod backends of an application, and it implements forwarding rules defined by users through Service API objects.

Addons

Add-ons are cluster features and functionality not yet available in Kubernetes, therefore implemented through 3rd-party pods and services.

Decoupled microservices based applications rely heavily on networking in order to mimic the tight-coupling once available in the monolithic era. Networking, in general, is not the easiest to understand and implement. Kubernetes is no exception - as a containerized microservices orchestrator it needs to address a few distinct networking challenges:

All these networking challenges must be addressed before deploying a Kubernetes cluster.

Networking Challenges

K8s Configuration

As we have seen that there are 3 components mainly in k8s

  1. Control Plane
  2. ETCD
  3. Worker Nodes

Which gives us 6 different combinations out of which one is invalid (ETCD+WORKERNODES) So ignoring that one, we have five ways to orchestrate our system

  1. (Control Plane+ETCD+WorkerNodes) - All in One : Ideal for studying+testing on local
  2. (ControlPlane+ETCD) + Multiple WorkerNodes - Single control plane and multiple worker architecture: Stacked ETCD instance, single control plane handles many workers
  3. Controlplane + ETCD + WorkerNodes - This is basically resilient ETCD structure wherein ETCD is external
  4. Multi ControlPlane and Multi Worker - Highly Resilient High Cost factor
  5. Multi-Control Plane with Multi-Node etcd, and Multi-Worker Installation
    In this setup, we have multiple control plane nodes configured in HA mode, with each control plane node paired with an external etcd instance.
K8s Deployment Tools
  1. Testing/Local/ Non Production:
    • Minikube
    • Kind
    • Docker Desktop
    • Microk8s
    • k3s
  2. Production Setup:
    • Kubeadm
    • kubespray
    • kops
API Accessing
%%{ init : { "theme" : "light", "flowchart" : { "curve" : "stepBefore" }}}%%
graph LR;
 A["/"] --> h["/healthz"];
 A --> m["/metrics"];
 A --> a["/api"];
 A --> ap["/apis"];
 A --> extras[....];
 a --> version["/api/v1"];
 version --> pods["/api/v1/pods"];
 version --> nodes["/api/v1/nodes"];
 version --> services["/api/v1/services"];
 version --> extra["...."];
 ap --> apps["/apis/apps"];
 ap --> apps["/apis/...."];
 apps --> v1["/apis/apps/v1"];
 apps --> v1beta1["/apis/apps/v1/beta1"];
 v1 --> deployments["/apis/app/v1/Deployment"];
 v1 --> daemonset["/apis/app/v1/DaemonSet"];
 v1 --> sts["/apis/app/v1/StatefulSet"];
Kubernetes Buildiing blocks
Nodes

Nodes are virtual identities assigned by Kubernetes to the systems part of the cluster - whether Virtual Machines, bare-metal, Containers, etc. These identities are unique to each system, and are used by the cluster for resources accounting and monitoring purposes, which helps with workload management throughout the cluster.

Namespaces

If multiple users and teams use the same Kubernetes cluster we can partition the cluster into virtual sub-clusters using Namespaces. The names of the resources/objects created inside a Namespace are unique, but not across Namespaces in the cluster.

Pods

Pod is the smallest Kubernetes workload object. It is the unit of deployment in Kubernetes, which represents a single instance of the application.

Example k8s Manifest
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    run: nginx-pod
spec:
  containers:
  - name: nginx
    image: nginx:1.22.1
    ports:
    - containerPort: 80
Labels

Labels are key-value pairs attached to Kubernetes objects (e.g. Pods, ReplicaSets, Nodes, Namespaces, Persistent Volumes). Labels are used to organize and select a subset of objects, based on the requirements in place. Many objects can have the same Label(s). Labels do not provide uniqueness to objects. Controllers use Labels to logically group together decoupled objects, rather than using objects’ names or IDs.

Label Selectors

Controllers, or operators, and Services, use label selectors to select a subset of objects. Kubernetes supports two types of Selectors:

Replication Controller
Replicaset v/s Replication Controller
**apiVersion: apps/v1  
kind: ReplicaSet  
metadata:  
  name: frontend  
  labels:  
    app: guestbook  
    tier: frontend  
spec:  
  replicas: 3  
  selector:  
    matchLabels:  
      app: guestbook  
  template:  
    metadata:  
      labels:  
        app: guestbook  
    spec:  
      containers:  
      - name: php-redis  
        image: gcr.io/google_samples/gb-frontend:v3**
Deployments
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-deployment
  template:
    metadata:
      labels:
        app: nginx-deployment
    spec:
      containers:
      - name: nginx
        image: nginx:1.20.2
        ports:
        - containerPort: 80
Daemon Sets
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-agent
  namespace: kube-system
  labels:
    k8s-app: fluentd-agent
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-agent
  template:
    metadata:
      labels:
        k8s-app: fluentd-agent
    spec:
      containers:
      - name: fluentd-agent
        image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
Kubernetes Services

Authentication, Authorization and Admission Control

To access and manage Kubernetes resources or objects in the cluster, we need to access a specific API endpoint on the API server. Each access request goes through the following access control stages: 

![[Pasted image 20230723112546.png]]

Authentication

Basically there are many ways we can authenticate and these are available ways.

Authorization

Once authenticated users can send api requests. Similar to authentication, there are many “authorizers” which are authorization modules.

Attribute-Based Access Control (ABAC)
**With the ABAC authorizer, Kubernetes grants access to API requests, which combine policies with attributes. In the following example, user bob can only read Pods in the Namespace 
lfs158**.

{
  “apiVersion”: “abac.authorization.kubernetes.io/v1beta1”,
  “kind”: “Policy”,
  “spec”: {
    “user”: “bob”,
    “namespace”: “lfs158”,
    “resource”: “pods”,
    “readonly”: true
  }
}

To enable ABAC mode, we start the API server with the –authorization-mode=ABAC option, while specifying the authorization policy with –authorization-policy-file=PolicyFile.json. For more details, please review the ABAC authorization documentation.

Webhook
In Webhook mode, Kubernetes can request authorization decisions to be made by third-party services, which would return true for successful authorization, and false for failure. In order to enable the Webhook authorizer, we need to start the API server with the –authorization-webhook-config-file=SOME_FILENAME option, where SOME_FILENAME is the configuration of the remote authorization service. For more details, please see the Webhook mode documentation.

Role-Based Access Control (RBAC)
**In general, with RBAC we regulate the access to resources based on the Roles of individual users. In Kubernetes, multiple Roles can be attached to subjects like users, service accounts, etc. While creating the Roles, we restrict resource access by specific operations, such as 
creategetupdatepatch**, etc. These operations are referred to as verbs. In RBAC, we can create two kinds of Roles:

In this course, we will focus on the first kind, Role. Below you will find an example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: lfs158
  name: pod-reader
rules:
- apiGroups: [””] # “” indicates the core API group
  resources: [“pods”]
  verbs: [“get”, “watch”, “list”]

The manifest defines a pod-reader role, which has access only to read the Pods of lfs158 Namespace. Once the role is created, we can bind it to users with a RoleBinding object. There are two kinds of RoleBindings:

In this course, we will focus on the first kind, RoleBinding. Below, you will find an example:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-read-access
  namespace: lfs158
subjects:
- kind: User
  name: bob
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

The manifest defines a bind between the pod-reader Role and user bob, to restrict the user to only read the Pods of the lfs158 Namespace.

To enable the RBAC mode, we start the API server with the –authorization-mode=RBAC option, allowing us to dynamically configure policies. For more details, please review the RBAC mode.

Services

Deploying Stand-Alone Application

Kubernetes volume Management

Configmaps and Secrets

Ingress

An Ingress is a collection of rules that allow inbound connections to reach the cluster Services To allow the inbound connection to reach the cluster Services, Ingress configures a Layer 7 HTTP/HTTPS load balancer for Services and provides the following:

Ingress

With Ingress, users do not connect directly to a Service. Users reach the Ingress endpoint, and, from there, the request is forwarded to the desired Service. You can see an example of a Name-Based Virtual Hosting Ingress definition below:

**apiVersion: networking.k8s.io/v1   
kind: Ingress  
metadata:**  
  **annotations:**  
    **kubernetes.io/ingress.class: "nginx"**  
  **name: virtual-host-ingress  
  namespace: default  
spec:  
  rules:  
  - host: blue.example.com  
    http:  
      paths:  
      - backend:  
          service:  
            name: webserver-blue-svc  
            port:  
              number: 80  
        path: /  
        pathType: ImplementationSpecific  
  - host: green.example.com  
    http:  
      paths:  
      - backend:  
          service:  
            name: webserver-green-svc  
            port:  
              number: 80  
        path: /  
        pathType: ImplementationSpecific**

In the example above, user requests to both blue.example.com and green.example.com would go to the same Ingress endpoint, and, from there, they would be forwarded to webserver-blue-svc, and webserver-green-svc, respectively.

This diagram presents a Name-Based Virtual Hosting Ingress rule: 

Name-Based Virtual Hosting Ingress

graph LR
A[Ingress] --> B[Fanout];
A --> C[VirtualHost based];

Advanced Topics

K8s Community