Technology Sharing

【AIGC】2. Use GPU to start keras operation locally on Mac

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Mac uses GPU to start keras operation locally

1. Background

From the previous article, we have found that in the operation of large models, when using CPU for operation, the CPU usage is very high. Therefore, here we will think about how to give full play to the advantages of GPU in vector operation if there is a GPU in the machine, so as to make the operation related to machine learning faster and better.

2. Technical Background

We know that the current mainstream graphics cards that support machine learning are Nvidia series graphics cards, commonly known as N cards, but on Mac machines, AMD series graphics cards are usually integrated. The difference between the two different hardware instruction sets requires different implementation technologies for the upper layer.
However, in AMD graphics cards, there is a technology called PlaidML, which can be used to encapsulate the differences between different graphics cards.

PlaidML project address: https://github.com/plaidml/plaidml
Currently, PlaidML already supports tools such as Keras, ONNX and nGraph. You can directly use Keras to build a model and use the GPU on your MacBook easily.
With this tool called PlaidML, deep learning training can be easily performed regardless of NVIDIA, AMD or Intel graphics cards.

refer to:Mac uses PlaidML to accelerate reinforcement learning training

3. Experimental Verification

In this operation, for a conventional Keras algorithm, multiple rounds of calculations are performed under the CPU and GPU, and the time consumption is counted and compared.

Local Configuration

The software and hardware parameters of the Mac machine used this time are as follows
insert image description here

Installing PlaidML

Since the process of installing dependent packages requires command interaction, the plaidML package installation operation is performed on the command line and the code execution is performed in jupyter.

Since Jupyter needs to create a kernel when using a virtual environment, it is recommended to use the original Python environment for verification. Students who are familiar with the configuration features of Jupyter in a virtual environment can try to operate in a virtual environment.

Install plaidml-keras

pip3  install plaidml-keras
  • 1

After I installed the latest version of plaidml-keras 0.7.0 using the command pip3 install plaidml-keras, I encountered a bug when performing the initialization operation. I subsequently downgraded it to 0.6.4 and it worked fine. However, I installed it again to 0.7.0 and it worked fine again.
plaidml in github

Configure the default graphics card

Execute from the command line

plaidml-setup
  • 1

The interactive content is as follows

(venv) tsingj@tsingjdeMacBook-Pro-2 ~  # plaidml-setup

PlaidML Setup (0.6.4)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the Apache License 2.0


Default Config Devices:
   metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
   metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)

Experimental Config Devices:
   llvm_cpu.0 : CPU (LLVM)
   metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
   opencl_amd_radeon_pro_5300m_compute_engine.0 : AMD AMD Radeon Pro 5300M Compute Engine (OpenCL)
   opencl_cpu.0 : Intel CPU (OpenCL)
   opencl_intel_uhd_graphics_630.0 : Intel Inc. Intel(R) UHD Graphics 630 (OpenCL)
   metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)

Using experimental devices can cause poor performance, crashes, and other nastiness.

Enable experimental device support? (y,n)[n]:
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

List the currently supported graphics cards, and choose whether to support the 2 graphics cards by default, or all 6 supported hardware in the experimental stage.
You can see that the two graphics cards supported by default are the two graphics cards shown in the initial screenshot. For the sake of testing stability, select N and press Enter.

Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : metal_intel(r)_uhd_graphics_630.0
   2 : metal_amd_radeon_pro_5300m.0

Default device? (1,2)[1]:1

Selected device:
    metal_intel(r)_uhd_graphics_630.0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

For the default selected settings, set a default device. Here we first set metal_intel®_uhd_graphics_630.0 as the default device. Of course, this device actually has poor performance. Later we will set metal_amd_radeon_pro_5300m.0 as the default device for comparison.
After writing 1, press Enter.

Almost done. Multiplying some matrices...
Tile code:
  function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.

Save settings to /Users/tsingj/.plaidml? (y,n)[y]:y
Success!
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

Press Enter to write the configuration information into the default configuration file to complete the configuration.

Running CPU-based code

In this section, Jupyter is used to run a simple algorithm code and calculate its time.

Step 1 First import the keras package and import the data cifar10. This may involve downloading from the external network. If you have any questions, you can refer toBasic issues in using keras
#!/usr/bin/env python
import numpy as np
import os
import time
import keras
import keras.applications as kapp
from keras.datasets import cifar10
(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Note that the default keral backend here should use tenserflow, check the output

2024-07-11 14:36:02.753107: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

Step 2 Import the calculation model. If the model data does not exist locally, it will be automatically downloaded. If you have any questions, you can refer toBasic issues in using keras
model = kapp.VGG19()
  • 1
Step 3 Model compilation
model.compile(optimizer='sgd', loss='categorical_crossentropy',metrics=['accuracy'])
  • 1
Step 4 Make a prediction
print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)
  • 1
  • 2

Running initial batch (compiling tile program)
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 1s/step

Step 5: Make 10 predictions
# Now start the clock and run 10 batchesprint("Timing inference...")
start = time.time()
for i in range(10):
    y = model.predict(x=x_train, batch_size=batch_size)
    print("Ran in {} seconds".format(time.time() - start))
  • 1
  • 2
  • 3
  • 4
  • 5

1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 891ms/step
Ran in 0.9295139312744141 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 923ms/step
Ran in 1.8894760608673096 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 893ms/step
Ran in 2.818492889404297 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 932ms/step
Ran in 3.7831668853759766 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 892ms/step
Ran in 4.71358585357666 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 860ms/step
Ran in 5.609835863113403 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 878ms/step
Ran in 6.5182459354400635 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 871ms/step
Ran in 7.423128128051758 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 896ms/step
Ran in 8.352543830871582 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 902ms/step
Ran in 9.288795948028564 seconds

Running code using GPU computing

Graphics card used: metal_intel®_uhd_graphics_630.0

Step 0 Import keras through plaidml, and then perform keras related operations
# Importing PlaidML. Make sure you follow this order
import plaidml.keras
plaidml.keras.install_backend()
import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
  • 1
  • 2
  • 3
  • 4
  • 5

Note:
1. When using plaidml=0.7.0 version, the plaidml.keras.install_backend() operation will report an error
2. This step will import keras through plaidml and set the background operation engine to plaidml instead of tenserflow

Step 1: Import the keras package and the cifar10 data
#!/usr/bin/env python
import numpy as np
import os
import time
import keras
import keras.applications as kapp
from keras.datasets import cifar10
(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
Step 2 Import the calculation model. If the model data does not exist locally, it will be automatically downloaded.
model = kapp.VGG19()
  • 1

Output graphics card information for the first run

INFO:plaidml:Opening device “metal_intel®_uhd_graphics_630.0”

Step 3 Model compilation
model.compile(optimizer='sgd', loss='categorical_crossentropy',metrics=['accuracy'])
  • 1
Step 4 Make a prediction
print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)
  • 1
  • 2

Running initial batch (compiling tile program)

Since the output is fast, only one line of content is printed.

Step 5: Make 10 predictions
# Now start the clock and run 10 batchesprint("Timing inference...")
start = time.time()
for i in range(10):
    y = model.predict(x=x_train, batch_size=batch_size)
    print("Ran in {} seconds".format(time.time() - start))
  • 1
  • 2
  • 3
  • 4
  • 5

Ran in 4.241918087005615 seconds
Ran in 8.452141046524048 seconds
Ran in 12.665411949157715 seconds
Ran in 16.849968910217285 seconds
Ran in 21.025720834732056 seconds
Ran in 25.212764024734497 seconds
Ran in 29.405478954315186 seconds
Ran in 33.594977140426636 seconds
Ran in 37.7886438369751 seconds
Ran in 41.98136305809021 seconds

Graphics card metal_amd_radeon_pro_5300m.0

In the plaidml-setup setting, when selecting the graphics card, no longer select the graphics card metal_intel®_uhd_graphics_630.0, but metal_amd_radeon_pro_5300m.0

(venv) tsingj@tsingjdeMacBook-Pro-2 ~  # plaidml-setup

PlaidML Setup (0.6.4)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the Apache License 2.0


Default Config Devices:
   metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
   metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)

Experimental Config Devices:
   llvm_cpu.0 : CPU (LLVM)
   metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
   opencl_amd_radeon_pro_5300m_compute_engine.0 : AMD AMD Radeon Pro 5300M Compute Engine (OpenCL)
   opencl_cpu.0 : Intel CPU (OpenCL)
   opencl_intel_uhd_graphics_630.0 : Intel Inc. Intel(R) UHD Graphics 630 (OpenCL)
   metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)

Using experimental devices can cause poor performance, crashes, and other nastiness.

Enable experimental device support? (y,n)[n]:n

Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : metal_intel(r)_uhd_graphics_630.0
   2 : metal_amd_radeon_pro_5300m.0

Default device? (1,2)[1]:2

Selected device:
    metal_amd_radeon_pro_5300m.0

Almost done. Multiplying some matrices...
Tile code:
  function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.

Save settings to /Users/tsingj/.plaidml? (y,n)[y]:y
Success!
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
Step 0 Import keras through plaidml, and then perform keras related operations
# Importing PlaidML. Make sure you follow this order
import plaidml.keras
plaidml.keras.install_backend()
import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
  • 1
  • 2
  • 3
  • 4
  • 5

Note:
1. When using plaidml=0.7.0 version, the plaidml.keras.install_backend() operation will report an error
2. This step will import keras through plaidml and set the background operation engine to plaidml instead of tenserflow

Step 1: Import the keras package and the cifar10 data
#!/usr/bin/env python
import numpy as np
import os
import time
import keras
import keras.applications as kapp
from keras.datasets import cifar10
(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
Step 2 Import the calculation model. If the model data does not exist locally, it will be automatically downloaded.
model = kapp.VGG19()
  • 1

INFO:plaidml:Opening device “metal_amd_radeon_pro_5300m.0”
Note that the graphics card information is entered here for the first execution.

Step 3 Model compilation
model.compile(optimizer='sgd', loss='categorical_crossentropy',metrics=['accuracy'])
  • 1
Step 4 Make a prediction
print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)
  • 1
  • 2

Running initial batch (compiling tile program)

Since the output is fast, only one line of content is printed.

Step 5: Make 10 predictions
# Now start the clock and run 10 batchesprint("Timing inference...")
start = time.time()
for i in range(10):
    y = model.predict(x=x_train, batch_size=batch_size)
    print("Ran in {} seconds".format(time.time() - start))
  • 1
  • 2
  • 3
  • 4
  • 5

View the Output

Ran in 0.43606019020080566 seconds
Ran in 0.8583459854125977 seconds
Ran in 1.2787911891937256 seconds
Ran in 1.70143723487854 seconds
Ran in 2.1235032081604004 seconds
Ran in 2.5464580059051514 seconds
Ran in 2.9677979946136475 seconds
Ran in 3.390064001083374 seconds
Ran in 3.8117799758911133 seconds
Ran in 4.236911058425903 seconds

IV. Evaluation and Discussion

The memory value of the graphics card metal_intel®_uhd_graphics_630.0 is 1536 MB. Although as a graphics card, its performance in computing is not as good as the 6-core CPU of this machine;
The graphics card is metal_amd_radeon_pro_5300m.0, and the memory value is 4G. Its performance is nearly 1 times better than that of the local CPU.

From this we can see the powerful advantages of using GPU in machine learning operations.