NPU Installation

LLaMA-Factory supports Huawei Ascend NPU (A2/A3) devices. You can choose one of the following three methods for environment setup and usage:

Core Dependencies

All installation methods depend on the following components:

  • HDK: Firmware and drivers

  • CANN: Heterogeneous Computing Architecture

  • torch_npu: Ascend adaptation plugin for PyTorch

Required steps vary depending on the installation method:

  • Manual installation: Manually install HDK, CANN, and torch_npu.

  • Docker image/build: The host only needs to install HDK (driver/firmware). CANN and torch_npu are integrated in the image.

Method 1: Manual Environment Installation

This method requires you to manually install HDK, CANN, and torch_npu.

1. Versions and Download Links

This document lists the latest dependency versions and download links. Please choose according to your device model:

2. Drivers and Firmware

Choose the HDK installation package in .run or .deb format as appropriate, noting that packages are differentiated for aarch64 and x86.

The following uses the A2 series as an example. For the A3 series, the firmware and driver package names differ; choose based on the linked page.

A3 internal package names are similar to Atlas-A3-hdk-npu-driver_25.0.rc1.3_linux-aarch64.run and Atlas-A3-hdk-npu-firmware_7.7.0.3.228.run. The installation method does not change.

  1. Upload installation packages, log in as root and upload the driver and firmware packages to the server (e.g., /home).

  2. Add execution permissions, enter the package directory and execute the following commands.

    chmod +x Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run
    chmod +x Ascend-hdk-<chip_type>-npu-firmware_<version>.run
    
  3. Install drivers and firmware, The default installation path is /usr/local/Ascend.

    Install driver:

    ./Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run --full --install-for-all
    

    If Driver package installed successfully! appears, the installation succeeded.

    Install firmware:

    ./Ascend-hdk-<chip_type>-npu-firmware_<version>.run --full
    

    If Firmware package installed successfully! appears, the installation succeeded.

    Note

    If the default user HwHiAiUser has not been created, specify the user and group in the installation command: ./Ascend-hdk-*.run --full --install-username=<username> --install-usergroup=<usergroup>

  4. Decide whether to reboot according to prompts. To reboot:

    reboot
    
  5. Verify installation. Run the following command to check driver load status:

    npu-smi info
    
    ../../_images/npu-smi.png

3. CANN

Choose the CANN installation package in .run or .deb format as appropriate, noting that packages are differentiated for aarch64 and x86.

The following uses the A2 series as an example. The only difference for the A3 series is the name of the ops package; choose according to the linked page. A3 internal package names are similar to Ascend-cann-A3-ops_9.0.0_linux-aarch64.run. The installation method does not change.

(1) Install Toolkit development kit

Toolkit is used for training, inference, and development.

Note

Ensure the installation directory has more than 10G of free space.

  1. Authorization and installation: For root user: /usr/local/Ascend; For normal user: ${HOME}/Ascend.

    chmod +x Ascend-cann-toolkit_<version>_linux-aarch64.run
    ./Ascend-cann-toolkit_<version>_linux-aarch64.run --install
    
  2. Configure environment variables: For root users, it is recommended to write to ~/.bashrc.

    source /usr/local/Ascend/ascend-toolkit/set_env.sh
    

(2) Install ops operator package

Execute after installing Toolkit. To install static libraries, replace --install with --devel.

chmod +x Ascend-cann-<chip_type>-ops_<version>_linux-aarch64.run
./Ascend-cann-<chip_type>-ops_<version>_linux-aarch64.run --install

(3) Install NNAL neural network acceleration library (optional)

Includes ATB and SiP acceleration libraries. Execute after installing Toolkit.

  1. Authorization and installation:

    chmod +x Ascend-cann-nnal_<version>_linux-aarch64.run
    ./Ascend-cann-nnal_<version>_linux-aarch64.run --install
    
  2. Configure environment variables:

    (Choose one of the two, do not configure both)

    # ATB
    source ${HOME}/Ascend/nnal/atb/set_env.sh
    
    # SiP
    source ${HOME}/Ascend/nnal/asdsip/set_env.sh
    

4. torch-npu

It is recommended to install the torch-npu plugin together when installing LLaMA-Factory. LLaMA-Factory dependencies will continuously update stable versions of the torch-npu plugin.

pip install -r requirements/npu.txt

You can also download and install the torch-npu plugin manually, for example:

pip install torch_npu-version-cp311-cp311-manylinux_2_17_aarch64.whl

Notes when installing the torch-npu plugin:

  • The downloaded torch_npu distinguishes between supported Python versions. Choose the appropriate installation package for your environment. When running pip install torch_npu, the corresponding version of torch will be installed together.

  • The versions of torch-npu and torch installed in the environment must be aligned. For example, if torch-npu is version 2.7.1, then torch must also be version 2.7.1. Dependency conflicts may occasionally update torch during installation, leading to errors.

5. Verify Installation

Run the following Python script:

import torch
import torch_npu
print(torch.npu.is_available())

Expected output: True

../../_images/npu-torch.png

This indicates that HDK, CANN, and torch_npu are all installed correctly and functioning.

Method 2: Docker Pre-installed Image

Note

Ensure the host has installed firmware and drivers; refer to the previous section for installation.

LLaMA-Factory’s official images are hosted on Docker Hub and quay.io; the images are identical.

1. Pull Image

Download the latest image from the main branch (choose A2 or A3 according to your device). For specific version images, visit the image repository to check the tags.

# Docker Hub
docker pull hiyouga/llamafactory:latest-npu-a2
docker pull hiyouga/llamafactory:latest-npu-a3

# quay.io
docker pull quay.io/ascend/llamafactory:latest-npu-a2
docker pull quay.io/ascend/llamafactory:latest-npu-a3

2. Start Container

Start the container with the following command (modify DOCKER_IMAGE and device as appropriate):

CONTAINER_NAME=llama_factory_npu
DOCKER_IMAGE=hiyouga/llamafactory:latest-npu-a2

docker run -itd \
    --cap-add=SYS_PTRACE \
    --net=host \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --device=/dev/davinci2 \
    --device=/dev/davinci3 \
    --device=/dev/davinci4 \
    --device=/dev/davinci5 \
    --device=/dev/davinci6 \
    --device=/dev/davinci7 \
    --device=/dev/davinci_manager \
    --device=/dev/devmm_svm \
    --device=/dev/hisi_hdc \
    --shm-size=1200g \
    -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /data:/data \
    --name "$CONTAINER_NAME" \
    "$DOCKER_IMAGE" \
    /bin/bash

Note

Setting --privileged=true enables privileged mode, granting the container full access to hardware management devices (such as /dev/davinci_manager). This resolves driver initialization failures in multi-container scenarios caused by permission restrictions, ensuring NPU resources can be reused across containers.

Note: Without this parameter, subsequent containers may fail to access devices due to insufficient permissions after the first container occupies them. Given the broad permissions of privileged mode, evaluate security risks carefully before using it in production.

3. Enter Container

docker exec -it llama_factory_npu bash

Note

Mount specified NPU cards using --device /dev/davinci<N> (supports 0–7). Device numbers inside the container are automatically remapped (e.g., physical machine davinci6 -> container device 0).

After entering the container, you can start training with llamafactory-cli train. If the current shell has not loaded the Ascend environment variables, run source /usr/local/Ascend/ascend-toolkit/set_env.sh first.

Method 3: Docker Local Build

Note

Ensure the host has installed firmware and drivers.

LLaMA-Factory provides two build methods: 1. Build Using Docker Build and 2. Build Using Docker Compose.

1. Build Using Docker Build

  1. Build image — execute at the project root:

    # Ascend-A2
    docker build -f ./docker/docker-npu/Dockerfile --build-arg INSTALL_DEEPSPEED=false --build-arg PIP_INDEX=https://pypi.org/simple -t llamafactory:latest .
    
    # Ascend-A3
    docker build -f ./docker/docker-npu/Dockerfile --build-arg BASE_IMAGE=quay.io/ascend/cann:9.0.0-a3-ubuntu22.04-py3.11 --build-arg INSTALL_DEEPSPEED=false --build-arg PIP_INDEX=https://pypi.org/simple -t llamafactory:latest .
    

Note

Modify the BASE_IMAGE parameter to specify other CANN versions (see ascend/cann).

  1. Start container

    CONTAINER_NAME=llama_factory_npu
    DOCKER_IMAGE=llamafactory:latest
    docker run -itd \
        --cap-add=SYS_PTRACE \
        --net=host \
        --device=/dev/davinci0 \
        --device=/dev/davinci1 \
        --device=/dev/davinci2 \
        --device=/dev/davinci3 \
        --device=/dev/davinci4 \
        --device=/dev/davinci5 \
        --device=/dev/davinci6 \
        --device=/dev/davinci7 \
        --device=/dev/davinci_manager \
        --device=/dev/devmm_svm \
        --device=/dev/hisi_hdc \
        --shm-size=1200g \
        -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
        -v /usr/local/dcmi:/usr/local/dcmi \
        -v /etc/ascend_install.info:/etc/ascend_install.info \
        -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
        -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
        -v /data:/data \
        --name "$CONTAINER_NAME" \
        "$DOCKER_IMAGE" \
        /bin/bash
    

Note

Setting --privileged=true enables privileged mode, granting the container full access to hardware management devices (such as /dev/davinci_manager). This resolves driver initialization failures in multi-container scenarios caused by permission restrictions, ensuring NPU resources can be reused across containers.

Note: Without this parameter, subsequent containers may fail to access devices due to insufficient permissions after the first container occupies them. Given the broad permissions of privileged mode, evaluate security risks carefully before using it in production.

  1. Enter container

    docker exec -it llama_factory_npu bash
    

2. Build Using Docker Compose

  1. Enter directory

cd docker/docker-npu
  1. Build image and start container directly — choose commands based on device model:

# Ascend-A2
docker-compose up -d

# Ascend-A3
docker-compose --profile a3 up -d llamafactory-a3
  1. Enter container

docker exec -it llamafactory-a2 bash

Note

Before building, check the devices list in docker-compose.yml. Currently only card 0 is mounted during build; modify as needed.