Cloud Integration
You can run NVTabular on the cloud using:
Amazon Web Services
Amazon Web Services (AWS) offers EC2 instances with NVIDIA GPU support. NVTabular can be used with 1x, 4x, or 8x GPU instances (or multi-node setup). We’re using an EC2 instance with 8x NVIDIA A100 GPUs to demonstrate the steps below. Please check out the $/h for this instance type and adjust the type.
To run NVTabular on the cloud using AWS, do the following:
Start the AWS EC2 instance with the NVIDIA Deep Learning AMI image using the aws-cli.
# Starts the P4D instance with 8x NVIDIA A100 GPUs (take a look at the $/h for this instance type before using them) aws ec2 run-instances --image-id ami-04c0416d6bd8e4b1f --count 1 --instance-type p4d.24xlarge --key-name <MyKeyPair> --security-groups <my-sg>
SSH into the machine.
Create a RAID volume by running the following command:
Depending on the EC2 instance, the machine may include local disk storage. We can optimize the performance by creating a RAID volume. Based on our experience, two NVMe volumes yield the best performance.
sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1 sudo mkfs.ext4 -L MY_RAID /dev/md0 sudo mkdir -p /mnt/raid sudo mount LABEL=MY_RAID /mnt/raid sudo chmod -R 777 /mnt/raid # Copy dataset inside raid directory: cp -r data/ /mnt/raid/data/
Launch the NVTabular docker container by running the following command:
docker run --gpus all --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE -v /mnt/raid:/raid nvcr.io/nvidia/nvtabular:0.3 /bin/bash
Start the jupyter-lab server by running the following command:
jupyter-lab --allow-root --ip='0.0.0.0' --NotebookApp.token='<password>'
Google Cloud Platform
The Google Cloud Platform offers Compute Engine instances with NVIDIA GPU support. We’re using a VM with 8x NVIDIA A100 GPUs and eight local SSD-NVMe devices configured as RAID 0 to demonstrate the steps below.
To run NVTabular on the cloud using GCP, do the following:
Configure and create the VM.
GPU: 8xA100 (a2-highgpu-8g)
Boot Disk: Ubuntu version 18.04
Storage: Local 8xSSD-NVMe
Install the appropriate NVIDIA drivers and CUDA.
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /" sudo apt -y update sudo apt -y install cuda nvidia-smi # Check installation
-
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get -y update sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi # Check Installation
Configure the storage as RAID 0 by running the following commands:
sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --raid-devices=2 /dev/nvme0n1 /dev/nvme0n2 sudo mkfs.ext4 -L MY_RAID /dev/md0 sudo mkdir -p /mnt/raid sudo mount LABEL=MY_RAID /mnt/raid sudo chmod -R 777 /mnt/raid # Copy data to RAID cp -r data/ /mnt/raid/data/
Run the container using the following command:
docker run --gpus all --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE -v /mnt/raid:/raid nvcr.io/nvidia/nvtabular:0.3 /bin/bash