NPU Installation¶
LLaMA-Factory supports Huawei Ascend NPU (A2/A3) devices. You can choose one of the following three methods for environment setup and usage:
Core Dependencies¶
All installation methods depend on the following components:
HDK: Firmware and drivers
CANN: Heterogeneous Computing Architecture
torch_npu: Ascend adaptation plugin for PyTorch
Required steps vary depending on the installation method:
Manual installation: Manually install HDK, CANN, and torch_npu.
Docker image/build: The host only needs to install HDK (driver/firmware). CANN and torch_npu are integrated in the image.
Method 1: Manual Environment Installation¶
This method requires you to manually install HDK, CANN, and torch_npu.
1. Versions and Download Links¶
This document lists the latest dependency versions and download links. Please choose according to your device model:
Device |
Dependency |
Link |
|---|---|---|
A3 |
HDK |
|
CANN |
https://www.hiascend.com/developer/download/community/result?module=cann&cann=9.0.0 |
|
torch_npu |
2.7.1.post4 |
|
A2 |
HDK |
|
CANN |
https://www.hiascend.com/developer/download/community/result?module=cann&cann=9.0.0 |
|
torch_npu |
2.7.1.post4 |
2. Drivers and Firmware¶
Choose the HDK installation package in .run or .deb format as appropriate, noting that packages are differentiated for aarch64 and x86.
The following uses the A2 series as an example. For the A3 series, the firmware and driver package names differ; choose based on the linked page.
A3 internal package names are similar to Atlas-A3-hdk-npu-driver_25.0.rc1.3_linux-aarch64.run and Atlas-A3-hdk-npu-firmware_7.7.0.3.228.run. The installation method does not change.
Upload installation packages, log in as root and upload the driver and firmware packages to the server (e.g.,
/home).Add execution permissions, enter the package directory and execute the following commands.
chmod +x Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run chmod +x Ascend-hdk-<chip_type>-npu-firmware_<version>.run
Install drivers and firmware, The default installation path is
/usr/local/Ascend.Install driver:
./Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run --full --install-for-all
If
Driver package installed successfully!appears, the installation succeeded.Install firmware:
./Ascend-hdk-<chip_type>-npu-firmware_<version>.run --fullIf
Firmware package installed successfully!appears, the installation succeeded.Note
If the default user
HwHiAiUserhas not been created, specify the user and group in the installation command:./Ascend-hdk-*.run --full --install-username=<username> --install-usergroup=<usergroup>Decide whether to reboot according to prompts. To reboot:
reboot
Verify installation. Run the following command to check driver load status:
npu-smi info
3. CANN¶
Choose the CANN installation package in .run or .deb format as appropriate, noting that packages are differentiated for aarch64 and x86.
The following uses the A2 series as an example. The only difference for the A3 series is the name of the ops package; choose according to the linked page. A3 internal package names are similar to Ascend-cann-A3-ops_9.0.0_linux-aarch64.run. The installation method does not change.
(1) Install Toolkit development kit¶
Toolkit is used for training, inference, and development.
Note
Ensure the installation directory has more than 10G of free space.
Authorization and installation: For root user:
/usr/local/Ascend; For normal user:${HOME}/Ascend.chmod +x Ascend-cann-toolkit_<version>_linux-aarch64.run ./Ascend-cann-toolkit_<version>_linux-aarch64.run --install
Configure environment variables: For root users, it is recommended to write to
~/.bashrc.source /usr/local/Ascend/ascend-toolkit/set_env.sh
(2) Install ops operator package¶
Execute after installing Toolkit. To install static libraries, replace --install with --devel.
chmod +x Ascend-cann-<chip_type>-ops_<version>_linux-aarch64.run
./Ascend-cann-<chip_type>-ops_<version>_linux-aarch64.run --install
(3) Install NNAL neural network acceleration library (optional)¶
Includes ATB and SiP acceleration libraries. Execute after installing Toolkit.
Authorization and installation:
chmod +x Ascend-cann-nnal_<version>_linux-aarch64.run ./Ascend-cann-nnal_<version>_linux-aarch64.run --install
Configure environment variables:
(Choose one of the two, do not configure both)
# ATB source ${HOME}/Ascend/nnal/atb/set_env.sh # SiP source ${HOME}/Ascend/nnal/asdsip/set_env.sh
4. torch-npu¶
It is recommended to install the torch-npu plugin together when installing LLaMA-Factory. LLaMA-Factory dependencies will continuously update stable versions of the torch-npu plugin.
pip install -r requirements/npu.txt
You can also download and install the torch-npu plugin manually, for example:
pip install torch_npu-version-cp311-cp311-manylinux_2_17_aarch64.whl
Notes when installing the torch-npu plugin:
The downloaded
torch_npudistinguishes between supported Python versions. Choose the appropriate installation package for your environment. When runningpip install torch_npu, the corresponding version oftorchwill be installed together.The versions of
torch-npuandtorchinstalled in the environment must be aligned. For example, iftorch-npuis version2.7.1, thentorchmust also be version2.7.1. Dependency conflicts may occasionally updatetorchduring installation, leading to errors.
5. Verify Installation¶
Run the following Python script:
import torch
import torch_npu
print(torch.npu.is_available())
Expected output: True
This indicates that HDK, CANN, and torch_npu are all installed correctly and functioning.
Method 2: Docker Pre-installed Image¶
Note
Ensure the host has installed firmware and drivers; refer to the previous section for installation.
LLaMA-Factory’s official images are hosted on Docker Hub and quay.io; the images are identical.
1. Pull Image¶
Download the latest image from the main branch (choose A2 or A3 according to your device). For specific version images, visit the image repository to check the tags.
# Docker Hub
docker pull hiyouga/llamafactory:latest-npu-a2
docker pull hiyouga/llamafactory:latest-npu-a3
# quay.io
docker pull quay.io/ascend/llamafactory:latest-npu-a2
docker pull quay.io/ascend/llamafactory:latest-npu-a3
2. Start Container¶
Start the container with the following command (modify DOCKER_IMAGE and device as appropriate):
CONTAINER_NAME=llama_factory_npu
DOCKER_IMAGE=hiyouga/llamafactory:latest-npu-a2
docker run -itd \
--cap-add=SYS_PTRACE \
--net=host \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
--shm-size=1200g \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /data:/data \
--name "$CONTAINER_NAME" \
"$DOCKER_IMAGE" \
/bin/bash
Note
Setting --privileged=true enables privileged mode, granting the container full access to hardware management devices (such as /dev/davinci_manager). This resolves driver initialization failures in multi-container scenarios caused by permission restrictions, ensuring NPU resources can be reused across containers.
Note: Without this parameter, subsequent containers may fail to access devices due to insufficient permissions after the first container occupies them. Given the broad permissions of privileged mode, evaluate security risks carefully before using it in production.
3. Enter Container¶
docker exec -it llama_factory_npu bash
Note
Mount specified NPU cards using --device /dev/davinci<N> (supports 0–7). Device numbers inside the container are automatically remapped (e.g., physical machine davinci6 -> container device 0).
After entering the container, you can start training with llamafactory-cli train. If the current shell has not loaded the Ascend environment variables, run source /usr/local/Ascend/ascend-toolkit/set_env.sh first.
Method 3: Docker Local Build¶
Note
Ensure the host has installed firmware and drivers.
LLaMA-Factory provides two build methods: 1. Build Using Docker Build and 2. Build Using Docker Compose.
1. Build Using Docker Build¶
Build image — execute at the project root:
# Ascend-A2 docker build -f ./docker/docker-npu/Dockerfile --build-arg INSTALL_DEEPSPEED=false --build-arg PIP_INDEX=https://pypi.org/simple -t llamafactory:latest . # Ascend-A3 docker build -f ./docker/docker-npu/Dockerfile --build-arg BASE_IMAGE=quay.io/ascend/cann:9.0.0-a3-ubuntu22.04-py3.11 --build-arg INSTALL_DEEPSPEED=false --build-arg PIP_INDEX=https://pypi.org/simple -t llamafactory:latest .
Note
Modify the BASE_IMAGE parameter to specify other CANN versions (see ascend/cann).
Start container
CONTAINER_NAME=llama_factory_npu DOCKER_IMAGE=llamafactory:latest docker run -itd \ --cap-add=SYS_PTRACE \ --net=host \ --device=/dev/davinci0 \ --device=/dev/davinci1 \ --device=/dev/davinci2 \ --device=/dev/davinci3 \ --device=/dev/davinci4 \ --device=/dev/davinci5 \ --device=/dev/davinci6 \ --device=/dev/davinci7 \ --device=/dev/davinci_manager \ --device=/dev/devmm_svm \ --device=/dev/hisi_hdc \ --shm-size=1200g \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /data:/data \ --name "$CONTAINER_NAME" \ "$DOCKER_IMAGE" \ /bin/bash
Note
Setting --privileged=true enables privileged mode, granting the container full access to hardware management devices (such as /dev/davinci_manager). This resolves driver initialization failures in multi-container scenarios caused by permission restrictions, ensuring NPU resources can be reused across containers.
Note: Without this parameter, subsequent containers may fail to access devices due to insufficient permissions after the first container occupies them. Given the broad permissions of privileged mode, evaluate security risks carefully before using it in production.
Enter container
docker exec -it llama_factory_npu bash
2. Build Using Docker Compose¶
Enter directory
cd docker/docker-npu
Build image and start container directly — choose commands based on device model:
# Ascend-A2
docker-compose up -d
# Ascend-A3
docker-compose --profile a3 up -d llamafactory-a3
Enter container
docker exec -it llamafactory-a2 bash
Note
Before building, check the devices list in docker-compose.yml. Currently only card 0 is mounted during build; modify as needed.