NPU Installation¶

This document describes how to prepare the LLaMA-Factory environment on Huawei Ascend NPUs. It currently focuses on Atlas A2/A3 training series devices. Before installation, confirm the hardware model and operating system compatibility, and then choose the appropriate deployment method.

Hardware Compatibility and Supported Operating Systems¶

Table 1 Hardware Support List

Product	Supported
Ascend 950 Series Products	√
Atlas A3 Training Series Products	√
Atlas A3 Inference Series Products	x
Atlas A2 Training Series Products	√
Atlas A2 Inference Series Products	x
Atlas 200I/500 A2 Inference Products	x
Atlas Inference Series Products	x
Atlas Training Series Products	x

Note

In this table, “√” indicates supported and “x” indicates not supported.

For operating systems supported by each hardware product in physical-machine deployment scenarios, see the Compatibility Query Assistant.
For operating systems supported by each hardware product in VM and container deployment scenarios, see the “Operating System Compatibility Description” section in CANN Software Installation (commercial edition) or the “Operating System Compatibility Description” section (community edition).

After confirming that the hardware and operating system meet the preceding requirements, choose one of the following three methods for environment setup and usage:

Method 1: Manual Environment Installation
Method 2: Docker Pre-installed Image
Method 3: Docker Local Build

Core Dependencies¶

All installation methods depend on the following components:

HDK: Firmware and drivers
CANN: Heterogeneous Computing Architecture
torch_npu: Ascend adaptation plugin for PyTorch

Required steps vary depending on the installation method:

Manual installation: Manually install HDK, CANN, and torch_npu.
Docker image/build: The host only needs to install HDK (driver/firmware). CANN and torch_npu are integrated in the image.

Method 1: Manual Environment Installation¶

This method requires you to manually install HDK, CANN, and torch_npu.

1. Versions and Download Links¶

This document lists the latest dependency versions and download links. Please choose according to your device model:

Device	Dependency	Link
A3	HDK	https://www.hiascend.com/hardware/firmware-drivers/community?product=1&model=30&cann=9.0.0&driver=Ascend+HDK+26.0.RC1
	CANN	https://www.hiascend.com/developer/download/community/result?module=cann&cann=9.0.0
	torch_npu	2.7.1.post4
A2	HDK	https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=9.0.0&driver=Ascend+HDK+26.0.RC1
	CANN	https://www.hiascend.com/developer/download/community/result?module=cann&cann=9.0.0
	torch_npu	2.7.1.post4

2. Drivers and Firmware¶

Choose the HDK installation package in .run or .deb format as appropriate, noting that packages are differentiated for aarch64 and x86.

The following uses the A2 series as an example. For the A3 series, the firmware and driver package names differ; choose based on the linked page.

A3 internal package names are similar to Atlas-A3-hdk-npu-driver_25.0.rc1.3_linux-aarch64.run and Atlas-A3-hdk-npu-firmware_7.7.0.3.228.run. The installation method does not change.

Upload installation packages, log in as root and upload the driver and firmware packages to the server (e.g., /home).

Add execution permissions, enter the package directory and execute the following commands.

chmod +x Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run
chmod +x Ascend-hdk-<chip_type>-npu-firmware_<version>.run

Install drivers and firmware, The default installation path is /usr/local/Ascend.

Install driver:
```
./Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run --full --install-for-all
```
If Driver package installed successfully! appears, the installation succeeded.

Install firmware:
```
./Ascend-hdk-<chip_type>-npu-firmware_<version>.run --full
```
If Firmware package installed successfully! appears, the installation succeeded.

Note

If the default user HwHiAiUser has not been created, specify the user and group in the installation command: ./Ascend-hdk-*.run --full --install-username=<username> --install-usergroup=<usergroup>
Decide whether to reboot according to prompts. To reboot:
```
reboot
```
Verify installation. Run the following command to check driver load status:
```
npu-smi info
```

3. CANN¶

Choose the CANN installation package in .run or .deb format as appropriate, noting that packages are differentiated for aarch64 and x86.

The following uses the A2 series as an example. The only difference for the A3 series is the name of the ops package; choose according to the linked page. A3 internal package names are similar to Ascend-cann-A3-ops_9.0.0_linux-aarch64.run. The installation method does not change.

(1) Install Toolkit development kit¶

Toolkit is used for training, inference, and development.

Note

Ensure the installation directory has more than 10G of free space.

Authorization and installation: For root user: /usr/local/Ascend; For normal user: ${HOME}/Ascend.

chmod +x Ascend-cann-toolkit_<version>_linux-aarch64.run
./Ascend-cann-toolkit_<version>_linux-aarch64.run --install

Configure environment variables: For root users, it is recommended to write to ~/.bashrc.
```
source /usr/local/Ascend/ascend-toolkit/set_env.sh
```

(2) Install ops operator package¶

Execute after installing Toolkit. To install static libraries, replace --install with --devel.

chmod +x Ascend-cann-<chip_type>-ops_<version>_linux-aarch64.run
./Ascend-cann-<chip_type>-ops_<version>_linux-aarch64.run --install

(3) Install NNAL neural network acceleration library (optional)¶

Includes ATB and SiP acceleration libraries. Execute after installing Toolkit.

Authorization and installation:

chmod +x Ascend-cann-nnal_<version>_linux-aarch64.run
./Ascend-cann-nnal_<version>_linux-aarch64.run --install

Configure environment variables:

(Choose one of the two, do not configure both)

# ATB
source ${HOME}/Ascend/nnal/atb/set_env.sh

# SiP
source ${HOME}/Ascend/nnal/asdsip/set_env.sh

4. torch-npu¶

It is recommended to install the torch-npu plugin together when installing LLaMA-Factory. LLaMA-Factory dependencies will continuously update stable versions of the torch-npu plugin.

pip install -r requirements/npu.txt

You can also download and install the torch-npu plugin manually, for example:

pip install torch_npu-version-cp311-cp311-manylinux_2_17_aarch64.whl

Notes when installing the torch-npu plugin:

The downloaded torch_npu distinguishes between supported Python versions. Choose the appropriate installation package for your environment. When running pip install torch_npu, the corresponding version of torch will be installed together.
The versions of torch-npu and torch installed in the environment must be aligned. For example, if torch-npu is version 2.7.1, then torch must also be version 2.7.1. Dependency conflicts may occasionally update torch during installation, leading to errors.

5. Verify Installation¶

Run the following Python script:

import torch
import torch_npu
print(torch.npu.is_available())

Expected output: True

This indicates that HDK, CANN, and torch_npu are all installed correctly and functioning.

Method 2: Docker Pre-installed Image¶

Note

Ensure the host has installed firmware and drivers; refer to the previous section for installation.

LLaMA-Factory’s official images are hosted on Docker Hub and quay.io; the images are identical.

1. Pull Image¶

Download the latest image from the main branch (choose A2 or A3 according to your device). For specific version images, visit the image repository to check the tags.

# Docker Hub
docker pull hiyouga/llamafactory:latest-npu-a2
docker pull hiyouga/llamafactory:latest-npu-a3

# quay.io
docker pull quay.io/ascend/llamafactory:latest-npu-a2
docker pull quay.io/ascend/llamafactory:latest-npu-a3

2. Start Container¶

Start the container with the following command (modify DOCKER_IMAGE and device as appropriate):

CONTAINER_NAME=llama_factory_npu
DOCKER_IMAGE=hiyouga/llamafactory:latest-npu-a2

docker run -itd \
    --cap-add=SYS_PTRACE \
    --net=host \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --device=/dev/davinci2 \
    --device=/dev/davinci3 \
    --device=/dev/davinci4 \
    --device=/dev/davinci5 \
    --device=/dev/davinci6 \
    --device=/dev/davinci7 \
    --device=/dev/davinci_manager \
    --device=/dev/devmm_svm \
    --device=/dev/hisi_hdc \
    --shm-size=1200g \
    -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /data:/data \
    --name "$CONTAINER_NAME" \
    "$DOCKER_IMAGE" \
    /bin/bash

Note

Setting --privileged=true enables privileged mode, granting the container full access to hardware management devices (such as /dev/davinci_manager). This resolves driver initialization failures in multi-container scenarios caused by permission restrictions, ensuring NPU resources can be reused across containers.

Note: Without this parameter, subsequent containers may fail to access devices due to insufficient permissions after the first container occupies them. Given the broad permissions of privileged mode, evaluate security risks carefully before using it in production.

3. Enter Container¶

docker exec -it llama_factory_npu bash

Note

Mount specified NPU cards using --device /dev/davinci<N> (supports 0–7). Device numbers inside the container are automatically remapped (e.g., physical machine davinci6 -> container device 0).

After entering the container, you can start training with llamafactory-cli train. If the current shell has not loaded the Ascend environment variables, run source /usr/local/Ascend/ascend-toolkit/set_env.sh first.

Method 3: Docker Local Build¶

Note

Ensure the host has installed firmware and drivers.

LLaMA-Factory provides two build methods: 1. Build Using Docker Build and 2. Build Using Docker Compose.

1. Build Using Docker Build¶

Build image — execute at the project root:

# Ascend-A2
docker build -f ./docker/docker-npu/Dockerfile --build-arg INSTALL_DEEPSPEED=false --build-arg PIP_INDEX=https://pypi.org/simple -t llamafactory:latest .

# Ascend-A3
docker build -f ./docker/docker-npu/Dockerfile --build-arg BASE_IMAGE=quay.io/ascend/cann:9.0.0-a3-ubuntu22.04-py3.11 --build-arg INSTALL_DEEPSPEED=false --build-arg PIP_INDEX=https://pypi.org/simple -t llamafactory:latest .

Note

Modify the BASE_IMAGE parameter to specify other CANN versions (see ascend/cann).

Start container

CONTAINER_NAME=llama_factory_npu
DOCKER_IMAGE=llamafactory:latest
docker run -itd \
    --cap-add=SYS_PTRACE \
    --net=host \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --device=/dev/davinci2 \
    --device=/dev/davinci3 \
    --device=/dev/davinci4 \
    --device=/dev/davinci5 \
    --device=/dev/davinci6 \
    --device=/dev/davinci7 \
    --device=/dev/davinci_manager \
    --device=/dev/devmm_svm \
    --device=/dev/hisi_hdc \
    --shm-size=1200g \
    -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /data:/data \
    --name "$CONTAINER_NAME" \
    "$DOCKER_IMAGE" \
    /bin/bash

Note

Enter container
```
docker exec -it llama_factory_npu bash
```

2. Build Using Docker Compose¶

Enter directory

cd docker/docker-npu

Build image and start container directly — choose commands based on device model:

# Ascend-A2
docker-compose up -d

# Ascend-A3
docker-compose --profile a3 up -d llamafactory-a3

Enter container

docker exec -it llamafactory-a2 bash

Note

Before building, check the devices list in docker-compose.yml. Currently only card 0 is mounted during build; modify as needed.