A brief introduction to the concept of operating system virtualization and the technology to achieve operating system virtualization

Operating system level virtualization

Virtualization technologies such as KVM and XEN allow individual virtual machines to have their own independent operating systems. Unlike virtualization technologies such as KVM and XEN, so-called operating system level virtualization, also known as containerization, is a feature of the operating system itself that allows the existence of multiple isolated user space instances. These user space instances are also referred to as containers. A normal process can see all the resources of a computer and a process in the container can only see the resources assigned to that container. Generally speaking, operating system level virtualization groups the computer resources managed by the operating system, including processes, files, devices, networks, etc., and then hands them over to different containers. A process running in a container can only see the resources assigned to that container. Thereby achieving the purpose of isolation and virtualization.

Implementing operating system virtualization requires the use of Namespace and cgroups technologies.

Namespace

In programming languages, the concept of introducing namespaces is to reuse variable names or service routine names. Use the same variable name in different namespaces without conflict. The introduction of namespaces in Linux systems has a similar effect. For example, in a Linux system without operating system level virtualization, the user state process starts with a number (PID). After the introduction of operating system virtualization, different containers have different PID namespaces, and processes in each container can be numbered starting from 1 without conflict.

Currently, there are six types of namespaces in Linux, which correspond to the six resources managed by the operating system:

Mount point CLONE_NEWNS

Process (pid) CLONE_NEWPID

Network (net) CLONE_NEWNET

Interprocess communication (ipc) CLONE_NEWIPC

Hostname (uts) CLONE_NEWUTS

User (uid) CLONW_NEWUSER

In the future, the corresponding namespaces of time, device, etc. will be introduced.

Linux 2.4.19 introduced the first namespace - mount point, because there is no other type of namespace at that time, so the flag introduced in the clone system call is called CLONE_NEWNS

Three system calls related to the namespace (system calls)

The following three system calls are used to manipulate the namespace:

Clone() - used to create new processes and new namespaces, new processes will be placed in the new namespace

Unshare() - create a new namespace but not create a new child process, the child process created afterwards will be placed in the newly created namespace

Setns() - join the process to an existing namespace

Note: These three system calls will not change the pid namespace of the calling process, but will affect the pid namespace of its child processes.

The namespace itself does not use the name (囧), and different namespaces are identified by different inode numbers, which is also consistent with the conventions of Linux files. You can view the namespace to which a process belongs in the proc file system. For example, check the namespace to which the process with PID 4123 belongs:

Kelvin@desktop:~$ls -l /proc/4123/ns/

Total usage 0

Lrwxrwxrwx1kelvin kelvin012 month 2616:28cgroup -> cgroup:[4026531835]

Lrwxrwxrwx1kelvin kelvin012 month 2616:28ipc -> ipc:[4026531839]

Lrwxrwxrwx1kelvin kelvin012 month 2616:28mnt -> mnt:[4026531840]

Lrwxrwxrwx1kelvin kelvin012 month 2616:28net -> net:[4026531963]

Lrwxrwxrwx1kelvin kelvin012 month 2616:28pid -> pid:[4026531836]

Lrwxrwxrwx1kelvin kelvin012 month 2616:28user -> user:[4026531837]

Lrwxrwxrwx1kelvin kelvin012 month 2616:28uts -> uts:[4026531838]

The following code demonstrates how to use the above three system calls to manipulate the process's namespace:

#define _GNU_SOURCE

#include

#include

#include

#include

#include

#include

#include

#include

#include

#include

#include

#include

#define STACK_SIZE (10 * 1024 * 1024)

Charchild_stack[STACK_SIZE];

Intchild_main(void* args){

Pid_t child_pid = getpid();

Printf("I'm child process and my pid is %d ",child_pid);

// The child process will be placed in the newly created pid namespace of the clone system, so its pid should be 1

Sleep(300);

// The inode of the namespace will be deleted after all processes in the namespace exit, leaving it for subsequent operations

Return0;

}

Intmain(){

/* Clone */

Pid_t child_pid = clone(child_main,child_stack + STACK_SIZE,\

CLONE_NEWPID | SIGCHLD, NULL);

If(child_pid < 0){

Perror("clone failed");

}

/* Unshare */

Intret = unshare(CLONE_NEWPID);// The parent process calls unshare, creating a new namespace.

//But will not create a child process. The child process created later will be added to the new namespace.

If(ret < 0){

Perror("unshare failed");

}

Intfpid = fork();

If(fpid < 0){

Perror("fork error");

}elseif(fpid == 0){

Printf("I am child process. My pid is %d ", getpid());

// The child process after Fork will be added to the namespace created by unshare, so pid should be 1

Exit(0);

}else{

}

Waitpid(fpid,NULL,0);

/* Setns */

Charpath[80] = "";

Sprintf(path,"/proc/%d/ns/pid",child_pid);

Intfd = open(path,O_RDONLY);

If(fd == -1)

Perror("open error");

If(setns(fd,0) == -1)

// setns does not change the namespace of the current process, but will set the namespace of the child process created later.

Perror("setns error");

Close(fd);

Intnpid = fork();

If(npid < 0){

Perror("fork error");

}elseif(npid == 0){

Printf("I am child process. My pid is %d ", getpid());

// The new child process will be added to the pid namespace of the first child process, so its pid should be 2

Exit(0);

}else{

}

Return0;

}

operation result:

$sudo./ns

I'mchildprocess andmy pid is1

Iam childprocess.My pid is1

Iam childprocess.My pid is2

Control group (Cgroups)

If the namespace is isolated from the perspective of naming and numbering, the control group is to group the processes and truly limit and isolate the computing resources of each group of processes. A control group is a kernel mechanism that groups processes and tracks the computing resources they use. For each type of computing resource, the control group is controlled by a so-called subsystem. The existing subsystems at this stage include:

Cpusets: is used to allocate a set of CPUs to the specified cgroup. The processes in the cgroup are only scheduled to be executed on the CPU of the group.

Blkio : block IO of cgroup

Cpuacct : used to count CPU usage in cgroup

Devices : used to control the device nodes that cgroups can create and use in black and white lists.

Freezer : used to suspend the specified cgroup, or wake up the suspended cgroup

Hugtlb : Used to limit the use of hugetlb in cgroups

Memory : used to track the use of restricted memory and swap partitions

Net_cls : Used to mark packets according to the cgroup of the sender. The traffic controller assigns priorities based on these tags.

Net_prio : used to set the network communication priority of the cgroup

Cpu: used to set the scheduling parameters of the CPU in the cgroup

Perf_event : used to monitor cgroup CPU performance

Unlike the namespace, the control group does not add system calls. Instead, it implements a file system that manages control groups through file and directory operations. Let's take a look at an example of how a cgroup uses the cpuset subsystem to bind a process to a specified CPU.

1. Create a shell script that is executed all the time

#!/bin/bash

x=0

While[True];do

Done;

2. Execute this script in the background

# bash run.sh &

[1]20553

3. See which CPU the script is running on

# ps -eLo ruser,lwp,psr,args | grep 20553 | grep -v grep

Root 20553 3bash run.sh

You can see that the process with PID 20553 runs on the CPU numbered 3. The following uses cgroups to bind it to the CPU number 2 to execute.

4. Mount the file system of type cgroups into a newly created directory cgroups

# mkdir cgroups

# mount -t cgroup -o cpuset cgroups ./cgroups/

# ls cgroups/

Cgroup.clone_children cpuset.memory_pressure_enabled

Cgroup.procs cpuset.memory_spread_page

Cgroup.sane_behavior cpuset.memory_spread_slab

Cpuset.cpu_exclusive cpuset.mems

Cpuset.cpus cpuset.sched_load_balance

Cpuset.effective_cpus cpuset.sched_relax_domain_level

Cpuset.effective_mems docker

Cpuset.mem_exclusive tasks

Cpuset.mem_hardwall notify_on_release

Cpuset.memory_migrate release_agent

Cpuset.memory_pressure

5. Create a new group group0

# mkdir group0

# ls group0/

Cgroup.clone_children cpuset.mem_exclusive cpuset.mems

Cgroup.procs cpuset.mem_hardwall cpuset.sched_load_balance

Cpuset.cpu_exclusive cpuset.memory_migrate cpuset.sched_relax_domain_level

Cpuset.cpus cpuset.memory_pressure notify_on_release

Cpuset.effective_cpus cpuset.memory_spread_page tasks

Cpuset.effective_mems cpuset.memory_spread_slab

6. Add the above process 20553 to the newly created control group:

# echo 20553 >> group0/tasks

# cat group0/tasks

20553

7. The process that restricts this group can only be run on CPU number 2

# echo 2 > group0/cpuset.cpus

# cat group0/cpuset.cpus

2

8. View the CPU number running by the process with PID 20553

# ps -eLo ruser,lwp,psr,args | grep 20553 | grep -v grep

Root 20553 2bash run.sh

The above example simply shows how to use the control group. The control group is operated by files and directories, and the file system is a tree structure, so if you do not impose restrictions on the use of cgroups, the configuration will become extremely complicated and confusing. Therefore, some restrictions have been made in the new version of cgroups.

summary

This article briefly introduces the concept of operating system virtualization and the technology for implementing operating system virtualization - namespaces and control groups. The use of namespaces and control groups is demonstrated by two simple examples.
A brief introduction to the concept of operating system virtualization and the technology to achieve operating system virtualization

Sensor Series

Sensor series include electrode type water inlet detection sensor, Pressure Type Liquid Level Sensor (Marine), pressure type liquid level sensor (Marine side mounted type), Marine High Temperature Pressure Sensor (LED display) Compact Temperature Sensor , explosion-proof temperature sensor, UHC Marine Electrode Water Level Sensor, pressure sensor (shockproof type), marine pressure liquid level sensor (ceramic capacitive type) Radar Level Gauge Sensor , floating ball level sensor, Differential Pressure Sensor , Temperature And Humidity Sensor (dry battery power supply), explosion-proof (high temperature) pressure sensor, Gas Sensor and Marine Pressure Sensor . pipeline oil pollution detection sensor.

Level Sensor For Liquid,Small Temperature Sensor,Low Temperature Sensor,Water Temperature Sensor

Taizhou Jiabo Instrument Technology Co., Ltd. , https://www.taizhoujbcbyq.com

Posted on