Ubuntu下执行nvidia-smi报错:Failed to initialize NVML: Driver/library version mismatch

Ubuntu下,装完GPU驱动后,nvidia-smi报错:Failed to initialize NVML: Driver/library version mismatch。
最快的解决方法就是重启系统。

1
2
3
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
$ sudo reboot

如果不想重启,可以尝试把nvdia的驱动取消挂载。

1
2
3
4
5
6
7
$ sudo rmmod nvidia_drm
$ sudo rmmod nvidia_modeset
$ sudo rmmod nvidia_uvm
$ sudo rmmod nvidia
# 验证取消挂载的结果,如果没有返回表示成功。
$ lsmod | grep nvidia
$ nvidia-smi

Linux下df命令提示磁盘已满,但是du -sh显示还有空间

Linux下df命令提示磁盘已满,但是用du -sh计算出来/下的容量小于磁盘容量。

1
2
3
4
5
6
7
root@ax:~# df -Th
Filesystem Type Size Used Avail Use% Mounted on
...
/dev/mapper/ubuntu--vg-ubuntu--lv ext4 36G 36G 0G 100% /

root@ax:~# du -sh /
22G /

这种情况很大可能是因为/mnt下面挂载的外部存储影响了du命令的统计结果。
可以尝试下面的解决方法

1
2
3
4
5
6
7
8
9
10
11
# 使用bind参数把根目录重新挂载到某个目录下重新统计大小,减少业务影响。
root@ax:~# mkdir /mnt/tmp
root@ax:~# mount -o bind / /mnt/tmp
root@ax:~# du -sh /mnt/tmp
36G /mnt/tmp
# 经过查看,/mnt目录下原有的文件太大,导致磁盘满了。
# 由于环境中的mount挂载,让原有文件无法查看,进而影响了正常容量统计。删除后问题解决。
root@ax:~# du -sh /mnt/tmp/*
...
10G /mnt/tmp/mnt
root@ax:~# umount /mnt/tmp

macOS配置Github环境 | 2024最新

1. 配置Git全局的用户名和邮箱

1
2
$ git config --global user.name "your name"
$ git config --global user.email "your email like email@example.com"

2. 生成最新的SSH密钥

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ ssh-keygen -t ed25519 -C "email@example.com"
Generating public/private ed25519 key pair.
Enter file in which to save the key (/Users/me/.ssh/id_ed25519): /Users/me/.ssh/github.key
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/me/.ssh/github.key
Your public key has been saved in /Users/me/.ssh/github.key.pub
The key fingerprint is:
SHA256:xxxxxxxxxxxxxxxxx email@example.com
The key's randomart image is:
+--[ED25519 256]--+
| o+o. |
| .. +== |
| =o. Ooo |
| + oo oBEo |
| S .*oo o|
| X =.o o.|
| o = o|
| ... . o. |
| oo.o....o |
+----[SHA256]-----+

3. 访问Github Key设定画面,填写/Users/me/.ssh/github.key.pub的内容,保存SSH keys。

4. 设置SSH连接信息并测试。

1
2
3
4
5
6
7
$ vi ~/.ssh/config
Host github.com
HostName github.com
IdentityFile ~/.ssh/github.key
User git
$ ssh -T github.com
Hi xxx! You've successfully authenticated, but GitHub does not provide shell access.

CentOS7结束支持:yum出错Could not resolve host: mirrorlist.centos.org; Name or service not known

CentOS7在2024/6/30 EOL结束官方支持,镜像地址无法访问,导致执行yum命令时会发生错误。

1
2
3
4
5
$ sudo yum install telnet
Loaded plugins: fastestmirror
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=genclo error was
14: curl#6 - "Could not resolve host: mirrorlist.centos.org; Name or service not known"

CentOS软件仓库现在移到了 Vault Mirror里面,通过修改/etc/yum.repos.d/CentOS-Base.repo的设定可以解决这个错误。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
$ sed -i 's|mirrorlist=http://mirrorlist.centos.org|#mirrorlist=http://mirrorlist.centos.org|' /etc/yum.repos.d/CentOS-Base.repo
$ sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|' /etc/yum.repos.d/CentOS-Base.repo


$ cat /etc/yum.repos.d/CentOS-Base.repo
# 修改后内容如下

[base]
name=CentOS-$releasever - Base
baseurl=http://vault.centos.org/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

#released updates
[updates]
name=CentOS-$releasever - Updates
baseurl=http://vault.centos.org/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

#additional packages that may be useful
[extras]
name=CentOS-$releasever - Extras
baseurl=http://vault.centos.org/centos/$releasever/extras/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever - Plus
baseurl=http://vault.centos.org/centos/$releasever/centosplus/$basearch/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

# 清理缓存即可使用yum命令安装软件
$ yum clean all
$ yum update

Docker容器服务正常,但是无法通过端口访问服务 Connection Refused

Docker容器服务正常,端口映射正常,但是无法通过端口访问服务。可以尝试以下方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 修改容器内服务所绑定的地址为 0.0.0.0 而不是 127.0.0.1
# 解释:在 Docker 中,使用 0.0.0.0 作为监听地址,表示容器内的应用程序将监听所有可用的网络接口上的指定端口,包括容器的内部网络接口和 Docker 分配给容器的网络接口。这允许外部请求通过 Docker 的网络机制传递到应用程序中。

# 例1:Nginx 配置(nginx.conf):
server {
listen 0.0.0.0:80;
...
}

# 例2:Python 的 Flask 框架可以通过命令行指定
flask run --host=0.0.0.0

# 例3:Dockerfile 中指定
CMD ["sh", "start.sh"]
# start.sh
python app.py --host=0.0.0.0

把HuggingFace的模型转换成GGUF格式,并导入ollama | AI快速入门 2024

为了在ollama运行环境中执行HuggingFace的模型,需要转换成GGUF格式,并配置定义文件导入ollama。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 下载HuggingFace模型(例:Qwen2-7B)
git lfs install
git clone https://huggingface.co/Qwen/Qwen2-7B

# 在Qwen2-7B的同一父级目录下,下载转换模型用的Llama.cpp,安装必要的python包
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
cd ..
pip install -r llama.cpp/requirements.txt

# 转换Qwen2-7B至未量化的版本,便于后续尝试不同的量化模式比较效果
python llama.cpp/convert.py Qwen2-7B/ --outfile Qwen2-7B.gguf
# 对生成的GGUF模型按照指定精度量化(例:Q4_K_M),以便在较低的GPU环境运行
./llama.cpp/quantize Qwen2-7B.gguf Qwen2-7B-Q4_K_M.gguf Q4_K_M

# 如果项直接转换成f32,f16,q8_0的精度,上述两步操作也可以合并成一步(例:q8_0)
python llama.cpp/convert.py Qwen2-7B/ --outfile Qwen2-7B-q8_0.gguf --outtype q8_0

# 编辑ollama用的模型定义文件
touch Qwen2-7B/Qwen2-7B-Q4_K_M.Modelfile
vi Qwen2-7B/Qwen2-7B-Q4_K_M.Modelfile
# 文件内容
FROM ./Qwen2-7B-Q4_K_M.gguf

//todo 怎么写template
TEMPLATE """[INST] {{ .System }}
{{ .Prompt }} [/INST]
"""
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"


# 导入ollama,正确后会输出模型一览
ollama create Qwen2-7B-Q4_K_M -f Qwen2-7B-Q4_K_M.Modelfile
ollama list

两种方法获取macOS Finder的路径

使用macOS的终端时,经常用tab一层一层的跳转到目录下,太不效率了。
可以试试下面两种方法从Finder获取路径,直接粘贴到终端里。

  • Finder显示路径栏,直接拷贝
    Finder菜单 -》 点击【显示】 -》 选中【显示路径栏】 之后,
    Finder窗口下方会显示文件/文件夹的所处路径,右键即可拷贝路径名。

  • 选中文件/文件夹后,按住option键,拷贝按钮会变成路径拷贝,点击即可

leapp upgrade报错:Message: DNF execution failed with non zero exit code.

现象: CentOS7使用leapp upgrade 到Rocky Linux8时报错,升级失败。

主要错误消息有如下几条:

  1. Message: DNF execution failed with non zero exit code.
  2. Repository extras is listed more than once in the configuration
  3. python3-six-1.11.0-8.el8.noarch conflicts with file from package python36-six-1.14.0-3.el7.noarch
    详细日志如下。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    [root@as ~]# leapp upgrade
    ...
    ============================================================
    ERRORS
    ============================================================

    2024-05-23 18:47:52.063859 [ERROR] Actor: dnf_package_download
    Message: DNF execution failed with non zero exit code.
    STDOUT:
    Last metadata expiration check: 0:01:24 ago on Thu May 23 18:44:59 2024.
    Package python2-six-1.9.0-0.el7.noarch is already installed.
    Dependencies resolved.

    ...

    Running transaction test
    The downloaded packages were saved in cache until the next successful transaction.
    You can remove cached packages by executing 'dnf clean packages'.

    STDERR:
    Failed to create directory /var/lib/leapp/el8userspace//sys/fs/selinux: Read-only file system
    Failed to create directory /var/lib/leapp/el8userspace//sys/fs/selinux: Read-only file system
    No matches found for the following disable plugin patterns: subscription-manager
    Repository extras is listed more than once in the configuration
    Warning: Package marked by Leapp to upgrade not found in repositories metadata: gpg-pubkey
    RPM: warning: Generating 6 missing index(es), please wait...
    Error: Transaction test error:
    file /usr/lib/python3.6/site-packages/__pycache__/six.cpython-36.opt-1.pyc from install of python3-six-1.11.0-8.el8.noarch conflicts with file from package python36-six-1.14.0-3.el7.noarch
    file /usr/lib/python3.6/site-packages/__pycache__/six.cpython-36.pyc from install of python3-six-1.11.0-8.el8.noarch conflicts with file from package python36-six-1.14.0-3.el7.noarch
    file /usr/lib/python3.6/site-packages/six.py from install of python3-six-1.11.0-8.el8.noarch conflicts with file from package python36-six-1.14.0-3.el7.noarch
    file /usr/lib/python3.6/site-packages/urllib3/__init__.py from install of python3-urllib3-1.24.2-7.el8.noarch conflicts with file from package python36-urllib3-1.25.6-2.el7.noarch
    file /usr/lib/python3.6/site-packages/urllib3/__pycache__/__init__.cpython-36.opt-1.pyc from install of python3-urllib3-1.24.2-7.el8.noarch conflicts with file from package python36-urllib3-1.25.6-2.el7.noarch
    file /usr/lib/python3.6/site-packages/urllib3/__pycache__/__init__.cpython-36.pyc from install of python3-urllib3-1.24.2-7.el8.noarch conflicts with file from package python36-urllib3-1.25.6-2.el7.noarch
    ...
    file /usr/lib/python3.6/site-packages/chardet/cli/__pycache__/chardetect.cpython-36.opt-1.pyc from install of python3-chardet-3.0.4-7.el8.noarch conflicts with file from package python36-chardet-3.0.4-1.el7.noarch
    file /usr/lib/python3.6/site-packages/chardet/cli/__pycache__/chardetect.cpython-36.pyc from install of python3-chardet-3.0.4-7.el8.noarch conflicts with file from package python36-chardet-3.0.4-1.el7.noarch
    file /usr/lib/python3.6/site-packages/chardet/cli/chardetect.py from install of python3-chardet-3.0.4-7.el8.noarch conflicts with file from package python36-chardet-3.0.4-1.el7.noarch
    file /usr/lib/python3.6/site-packages/requests/__init__.py from install of python3-requests-2.20.0-4.el8.noarch conflicts with file from package python36-requests-2.14.2-2.el7.noarch
    file /usr/lib/python3.6/site-packages/requests/__pycache__/__init__.cpython-36.opt-1.pyc from install of python3-requests-2.20.0-4.el8.noarch conflicts with file from package python36-requests-2.14.2-2.el7.noarch
    file /usr/lib/python3.6/site-packages/requests/__pycache__/__init__.cpython-36.pyc from install of python3-requests-2.20.0-4.el8.noarch conflicts with file from package python36-requests-2.14.2-2.el7.noarch
    ...
    file /usr/lib64/python3.6/site-packages/_yaml.cpython-36m-x86_64-linux-gnu.so from install of python3-pyyaml-3.12-12.el8.x86_64 conflicts with file from package python36-PyYAML-3.13-1.el7.x86_64
    file /usr/lib64/python3.6/site-packages/yaml/__init__.py from install of python3-pyyaml-3.12-12.el8.x86_64 conflicts with file from package python36-PyYAML-3.13-1.el7.x86_64
    file /usr/lib64/python3.6/site-packages/yaml/__pycache__/__init__.cpython-36.opt-1.pyc from install of python3-pyyaml-3.12-12.el8.x86_64 conflicts with file from package python36-PyYAML-3.13-1.el7.x86_64
    ...

    ============================================================
    END OF ERRORS
    ============================================================


    Debug output written to /var/log/leapp/leapp-upgrade.log

    ============================================================
    REPORT
    ============================================================

    A report has been generated at /var/log/leapp/leapp-report.json
    A report has been generated at /var/log/leapp/leapp-report.txt

    ============================================================
    END OF REPORT
    ============================================================

    Answerfile has been generated at /var/log/leapp/answerfile
    2024-05-23 18:47:52.575 ERROR PID: 28176 leapp: Upgrade workflow failed, check log for details

原因分析:错误信息提示DNF安装包的时候,返回值不为0,触发异常。

python包显示冲突,前面的平台是 el8 后面的平台是 el7 ,猜测是升级过程中要装 el8 平台的python3-six包,
但是和现有的 el7 平台的包发生冲突,导致异常。于是根据日志里面显示的包名,尝试暂时移除掉,错误消失顺利升级成功。
【python3-six-1.11.0-8.el8.noarch conflicts with file from package python36-six-1.14.0-3.el7.noarch】

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@as ~]# yum remove -y python36-six python36-urllib3 python36-chardet python36-requests python36-PyYAML
[root@as ~]# leapp upgrade
...
DNF will only download packages, install gpg keys, and check the transaction.
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Complete!
====> * add_upgrade_boot_entry
Add new boot entry for Leapp provided initramfs.
A reboot is required to continue. Please reboot your system.


Debug output written to /var/log/leapp/leapp-upgrade.log

============================================================
REPORT
============================================================

A report has been generated at /var/log/leapp/leapp-report.json
A report has been generated at /var/log/leapp/leapp-report.txt

============================================================
END OF REPORT
============================================================

Answerfile has been generated at /var/log/leapp/answerfile