-
ius仓库不支持arm64架构
使用github workflow流水线构建镜像时,若支持arm64架构,则构建镜像失败,github action workflow 如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
name: Docker Image CI on: push: branches: [ "main" ] pull_request: branches: [ "main" ] jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v3 - name: Login to Docker Hub uses: docker/login-action@v2 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_TOKEN }} - name: Set up QEMU uses: docker/setup-qemu-action@v2 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Build and push uses: docker/build-push-action@v4 with: context: . file: ./Dockerfile platforms: | linux/amd64 linux/arm64 push: true tags: ${{ secrets.DOCKERHUB_USERNAME }}/centos7.9-slurm22:latest
platforms里包含 linux/arm64 时,构建镜像失败
1 2 3 4 5 6 7 8 9 10 11 12 13 14
> [linux/arm64 2/7] RUN set -ex && yum makecache fast && yum -y update && yum -y install https://repo.ius.io/ius-release-el7.rpm && yum -y install munge munge-devel mariadb-server mariadb-devel mysql-devel gcc gcc-c++ python3 readline-devel perl-ExtUtils-MakeMaker pam-devel http-parser-devel json-c-devel libyaml-devel libjwt-devel wget git vim bzip2 make automake libtool supervisor psmisc openldap openldap-servers openldap-clients nss-pam-ldapd authconfig kde-l10n-Chinese glibc-common bash-completion openssh-server && yum clean all && rm -rf /var/cache/yum: 13528588.7 5. Configure the failing repository to be skipped, if it is unavailable. 13529588.7 Note that yum will try to contact the repo. when it runs most commands, 13530588.7 so will have to try and fail each time (and thus. yum will be be much 13531588.7 slower). If it is a very temporary problem though, this is often a nice 13532588.7 compromise: 13533588.7 13534588.7 yum-config-manager --save --setopt=ius.skip_if_unavailable=true 13535588.7 13536588.7 failure: repodata/repomd.xml from ius: [Errno 256] No more mirrors to try.13537 588.7 https://repo.ius.io/7/aarch64/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found ------ 13539Dockerfile:6 13540--------------------
原因:ius仓库(https://repo.ius.io/ius-release-el7.rpm)不支持arm64架构, 解决:将workflow中的第31行 linux/arm64 删除
-
ERROR: failed to solve: circular dependency detected on stage: build
原因: Dockerfile脚本问题,使用多阶段构建时,第二个FROM前面的语句末尾存在 \
-
Dockerfile 构建时出现 command not found
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
$ cat Dockerfile FROM hekai/centos7.9-jdk8u202 RUN set -ex \ && yum makecache fast \ && yum -y update \ && yum -y install \ mariadb-server \ mariadb-devel \ mysql-devel \ && yum clean all \ && rm -rf /var/cache/yum \ # config mariadb && sed -i '/\[mysqld\]/a\innodb_buffer_pool_size=1024M\ninnodb_log_file_size=64M\ninnodb_lock_wait_timeout=900' /etc/my.cnf \ && /usr/bin/mysql_install_db --user=mysql &>/dev/null \ && /usr/bin/mysqld_safe --user=mysql & &>/dev/null \ && sleep 3s \ && mysql -e "CREATE USER 'slurm'@'localhost' identified by 'password'" \ && mysql -e "GRANT ALL ON slurm_acct_db.* to 'slurm'@'localhost' identified by 'password' with GRANT option" \ && mysql -e "CREATE DATABASE slurm_acct_db" CMD ["/bin/bash"]
构建镜像时出现错误
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
$ docker build -t mariadb:v1 -f Dockerfile.b . [+] Building 4.0s (5/5) FINISHED => [internal] load build definition from Dockerfile.b 0.1s => => transferring dockerfile: 832B 0.0s => [internal] load .dockerignore 0.1s => => transferring context: 109B 0.0s => [internal] load metadata for docker.io/hekai/centos7.9-jdk8u202:latest 0.0s => [1/2] FROM docker.io/hekai/centos7.9-jdk8u202 0.1s => ERROR [2/2] RUN set -ex && yum makecache fast && yum -y update && yum -y install mariadb-server ma 3.7s ------ > [2/2] RUN set -ex && yum makecache fast && yum -y update && yum -y install mariadb-server mariadb-devel mysql-devel && yum clean all && rm -rf /var/cache/yum && sed -i '/\[mysqld\]/a\innodb_buffer_pool_size=1024M\ninnodb_log_file_size=64M\ninnodb_lock_wait_timeout=900' /etc/my.cnf && /usr/bin/mysql_install_db --user=mysql &>/dev/null && /usr/bin/mysqld_safe --user=mysql & &>/dev/null && sleep 3s && mysql -e "CREATE USER 'slurm'@'localhost' identified by 'password'" && mysql -e "GRANT ALL ON slurm_acct_db.* to 'slurm'@'localhost' identified by 'password' with GRANT option" && mysql -e "CREATE DATABASE slurm_acct_db": #0 0.519 + yum makecache fast #0 1.193 Loaded plugins: fastestmirror, ovl #0 1.567 Determining fastest mirrors #0 2.744 * base: mirrors.huaweicloud.com #0 2.745 * extras: mirrors.huaweicloud.com #0 2.745 * updates: mirrors.huaweicloud.com #0 3.524 /bin/sh: mysql: command not found ------ Dockerfile.b:4 --------------------
原因:可能是docker对RUN指令进行优化,解析到 mysql 指令时还未安装mysql
解决:将mysql的安装和配置放在不同的RUN语句
修改后如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
$ cat Dockerfile FROM hekai/centos7.9-jdk8u202 RUN set -ex \ && yum makecache fast \ && yum -y update \ && yum -y install \ mariadb-server \ mariadb-devel \ mysql-devel \ && yum clean all \ && rm -rf /var/cache/yum \ RUN sest -ex \ # config mariadb && sed -i '/\[mysqld\]/a\innodb_buffer_pool_size=1024M\ninnodb_log_file_size=64M\ninnodb_lock_wait_timeout=900' /etc/my.cnf \ && /usr/bin/mysql_install_db --user=mysql &>/dev/null \ && /usr/bin/mysqld_safe --user=mysql & &>/dev/null \ && sleep 3s \ && mysql -e "CREATE USER 'slurm'@'localhost' identified by 'password'" \ && mysql -e "GRANT ALL ON slurm_acct_db.* to 'slurm'@'localhost' identified by 'password' with GRANT option" \ && mysql -e "CREATE DATABASE slurm_acct_db" CMD ["/bin/bash"]
-
构建镜像时出现错误,连不上mariadb
1
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
对应Dockerfile中的RUN指令
1 2 3 4 5 6 7 8 9 10 11 12 13
RUN set -ex \ ... ... # config mariadb && sed -i '/\[mysqld\]/a\innodb_buffer_pool_size=1024M\ninnodb_log_file_size=64M\ninnodb_lock_wait_timeout=900' /etc/my.cnf \ && /usr/bin/mysql_install_db --user=mysql &>/dev/null \ && /usr/bin/mysqld_safe --user=mysql & &>/dev/null \ && sleep 3s \ && mysql -e "CREATE USER 'slurm'@'localhost' identified by 'password'" \ && mysql -e "GRANT ALL ON slurm_acct_db.* to 'slurm'@'localhost' identified by 'password' with GRANT option" \ && mysql -e "CREATE DATABASE slurm_acct_db" \ ... ... # clean && rm -rf /var/log/* /var/cache/* /tmp/*
sleep 设置了100s 也无法解决
解决: 将mariadb单独抽出来,放在一个RUN中
原因: 未知
修改后如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14
RUN set -ex \ ... ... # clean && rm -rf /var/log/* /var/cache/* /tmp/* RUN set -ex \ # config mariadb && sed -i '/\[mysqld\]/a\innodb_buffer_pool_size=1024M\ninnodb_log_file_size=64M\ninnodb_lock_wait_timeout=900' /etc/my.cnf \ && /usr/bin/mysql_install_db --user=mysql &>/dev/null \ && /usr/bin/mysqld_safe --user=mysql & &>/dev/null \ && sleep 3s \ && mysql -e "CREATE USER 'slurm'@'localhost' identified by 'password'" \ && mysql -e "GRANT ALL ON slurm_acct_db.* to 'slurm'@'localhost' identified by 'password' with GRANT option" \ && mysql -e "CREATE DATABASE slurm_acct_db"
-
构建镜像出现错误,mysql启动失败
1 2 3 4 5 6 7 8 9 10 11
> [stage-1 5/7] RUN set -ex && sed -i '/\[mysqld\]/a\innodb_buffer_pool_size=1024M\ninnodb_log_file_size=64M\ninnodb_lock_wait_timeout=900' /etc/my.cnf && /usr/bin/mysql_install_db --user=mysql &>/dev/null && /usr/bin/mysqld_safe --user=mysql & &>/dev/null && sleep 3s && mysql -e "CREATE USER 'slurm'@'localhost' identified by 'password'" && mysql -e "GRANT ALL ON slurm_acct_db.* to 'slurm'@'localhost' identified by 'password' with GRANT option" && mysql -e "CREATE DATABASE slurm_acct_db": 257730.064 + sed -i '/\[mysqld\]/a\innodb_buffer_pool_size=1024M\ninnodb_log_file_size=64M\ninnodb_lock_wait_timeout=900' /etc/my.cnf 257740.068 + /usr/bin/mysql_install_db --user=mysql 257750.251 + /usr/bin/mysqld_safe --user=mysql 257760.355 230905 03:18:57 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'. 257770.386 230905 03:18:57 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 257780.390 /usr/bin/mysqld_safe_helper: Can't create/write to file '/var/log/mariadb/mariadb.log' (Errcode: 2) 257793.072 ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) 25780------ 25781Dockerfile.2:141 25782--------------------
原因是删掉了/var/log/mariadb目录,清理日志文件时,保留 mariadb 目录
1
find /var/log/* | grep -v -e mariadb | xargs rm -rf
-
配置slurm
slurm配置命令如下
1 2
./configure --prefix=/opt/slurm --sysconfdir=/opt/slurm/etc --enable-slurmrestd --with-mysql_config=/usr/bin --libdir=/usr/lib64
make install 安装时会将slurm的动态库文件放在 /usr/lib64/slurm下
如果配置时不指定–libdir,则会将动态库安装到prefix指定的目录下
1 2
./configure --prefix=/opt/slurm --sysconfdir=/opt/slurm/etc --enable-slurmrestd --with-mysql_config=/usr/bin
此时会将动态库安装到 /opt/slurm/lib64下
为了正常使用动态库,需要将/opt/slurm/lib64的动态库链接到/usr/lib64下
1
ln -s /opt/slurm/lib64 /usr/lib64/slurm
-
支持killall命令
1
/usr/local/bin/docker-entrypoint.sh: line 77: killall: command not found
解决:安装 psmisc
1
yum install -y psmisc
centos7精简版(minimal)运行killall命令提示 command not found
是由于没有安装psmisc所致,psmisc软件包包含三个帮助管理/proc目录的程序。
fuser, killall,pstree和pstree.x11(到pstree的链接)
- fuser 显示使用指定文件或者文件系统的进程的PID。
- killall 杀死某个名字的进程,它向运行指定命令的所有进程发出信号。
- pstree 树型显示当前运行的进程。
- pstree.x11 与pstree功能相同,只是在退出前需要确认。
-
安装git IDEA进入docker容器进行远程开发 提示:
1 2
Unsupported Git Version 1.8.3.1 At least 2.17.0 is required
需要对yum源中的git进行升级
在 centos7.9 的 docker 镜像中,是没有 git 的,所以有两种方式
- 编译安装新版本 git
- 安装 yum 源中的 git > 2.17.0 的版本
- IUS yum源中提供了 git 2.36.6 版本
- 注:安装 IUS yum 源时会自动安装 epel 源
- 安装 IUS 源:
yum -y install https://repo.ius.io/ius-release-el7.rpm
- 查看 git :
yum search git|grep -E "^git"
- 查看 git 版本:
yum info git236
- 安装 git 236 版本:
yum -y install git236
- endpoint yum源中提供了最新版本的 git
- 安装 endpoint 源:
yum install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo.x86_64.rpm
- 安装 git:
yum -y install git
- 安装 endpoint 源:
- IUS yum源中提供了 git 2.36.6 版本
-
编译安装python3 https://www.jianshu.com/p/2cad40bc9e1b
1 2 3 4 5 6 7
curl -O https://www.python.org/ftp/python/3.7.14/Python-3.7.14.tar.xz tar -xf Python-3.7.14.tar.xz cd Python-3.7.14 yum -y install make yum-builddep -y python ./configure --prefix=/opt/tools/python-3.7.14 make && make install
-
docker compose 挂root目录,导致环境变量无效
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
$ cat docker-compose.yml version: '3' services: slurm-master: image: hekai/centos7.9-slurm22 hostname: linux0 privileged: true stdin_open: true restart: always tty: true ports: - 389:389 - 6820:6820 environment: role: "master" TZ: Asia/Shanghai volumes: - .root:/root - .data/log:/var/log/slurm - etc_munge:/etc/munge - etc_slurm:/etc/slurm - spool_slurm:/var/spool/slurm - mysql:/var/lib/mysql slurm-compute-1: image: hekai/centos7.9-slurm22 hostname: linux1 privileged: true stdin_open: true restart: always tty: true environment: role: "compute" TZ: Asia/Shanghai volumes: - .data/log:/var/log/slurm - etc_munge:/etc/munge - etc_slurm:/etc/slurm depends_on: - "slurm-master" volumes: etc_munge: etc_slurm: spool_slurm: mysql:
如上,slurm-master 节点挂在了容器内的
/root
到本地,进入容器后,发现构建镜像时设置的环境变量没有了,在容器中执行下source /etc/profile
,然后可以将在/etc/profile.d/jdk.sh
中定义的变量加载出来了- 在
Dockerfile
中添加source /etc/profile
命令,无法解决 - 在
entrypoint.sh
中添加source /etc/profile
命令,无法解决 - 不直接挂载容器中
/root
,将/root
下需要的文件单独挂载出来,可以解决该问题
- 在
-
希望先查ldap中的用户,若ldap中不存在,再查linux系统用户
实现这种方案,需要修改 /etc/nsswitch.conf 中 passwprd、shadow、group 属性对应值的顺序
怎么优雅地修改这些属性值呢?
搜索了下,linux好像没有提供命令修改这些值的顺序,只能通过修改文本的方式 使用sed命令修改nsswitch.conf 中的用户搜索顺序1 2 3
sed -i 's/passwd: files ldap/passwd: ldap files/g' ./nsswitch.conf sed -i 's/shadow: files ldap/shadow: ldap files/g' ./nsswitch.conf sed -i 's/group: files ldap/group: ldap files/g' ./nsswitch.conf
参考:
https://serverfault.com/questions/972401/try-ldap-authentication-before-local-authentication
https://documents.uow.edu.au/~blane/netapp/ontap/nag/networking/concept/c_oc_netw_maintaining_host_name_search.html#c_oc_netw_maintaining_host_name_search
https://superuser.com/questions/1417190/ why-do-i-need-to-change-the-order-of-hosts-in-nsswitch-conf
https://man7.org/linux/man-pages/man5/nsswitch.conf.5.html
https://unix.stackexchange.com/questions/140378/editing-nsswitch-conf-file-safely