2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
From the previous article, we have found that in the operation of large models, when using CPU for operation, the CPU usage is very high. Therefore, here we will think about how to give full play to the advantages of GPU in vector operation if there is a GPU in the machine, so as to make the operation related to machine learning faster and better.
We know that the current mainstream graphics cards that support machine learning are Nvidia series graphics cards, commonly known as N cards, but on Mac machines, AMD series graphics cards are usually integrated. The difference between the two different hardware instruction sets requires different implementation technologies for the upper layer.
However, in AMD graphics cards, there is a technology called PlaidML, which can be used to encapsulate the differences between different graphics cards.
PlaidML project address: https://github.com/plaidml/plaidml
Currently, PlaidML already supports tools such as Keras, ONNX and nGraph. You can directly use Keras to build a model and use the GPU on your MacBook easily.
With this tool called PlaidML, deep learning training can be easily performed regardless of NVIDIA, AMD or Intel graphics cards.
refer to:Mac uses PlaidML to accelerate reinforcement learning training
In this operation, for a conventional Keras algorithm, multiple rounds of calculations are performed under the CPU and GPU, and the time consumption is counted and compared.
The software and hardware parameters of the Mac machine used this time are as follows
Since the process of installing dependent packages requires command interaction, the plaidML package installation operation is performed on the command line and the code execution is performed in jupyter.
Since Jupyter needs to create a kernel when using a virtual environment, it is recommended to use the original Python environment for verification. Students who are familiar with the configuration features of Jupyter in a virtual environment can try to operate in a virtual environment.
pip3 install plaidml-keras
After I installed the latest version of plaidml-keras 0.7.0 using the command pip3 install plaidml-keras, I encountered a bug when performing the initialization operation. I subsequently downgraded it to 0.6.4 and it worked fine. However, I installed it again to 0.7.0 and it worked fine again.
plaidml in github
Execute from the command line
plaidml-setup
The interactive content is as follows
(venv) tsingj@tsingjdeMacBook-Pro-2 ~ # plaidml-setup
PlaidML Setup (0.6.4)
Thanks for using PlaidML!
Some Notes:
* Bugs and other issues: https://github.com/plaidml/plaidml
* Questions: https://stackoverflow.com/questions/tagged/plaidml
* Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
* PlaidML is licensed under the Apache License 2.0
Default Config Devices:
metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)
Experimental Config Devices:
llvm_cpu.0 : CPU (LLVM)
metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
opencl_amd_radeon_pro_5300m_compute_engine.0 : AMD AMD Radeon Pro 5300M Compute Engine (OpenCL)
opencl_cpu.0 : Intel CPU (OpenCL)
opencl_intel_uhd_graphics_630.0 : Intel Inc. Intel(R) UHD Graphics 630 (OpenCL)
metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)
Using experimental devices can cause poor performance, crashes, and other nastiness.
Enable experimental device support? (y,n)[n]:
List the currently supported graphics cards, and choose whether to support the 2 graphics cards by default, or all 6 supported hardware in the experimental stage.
You can see that the two graphics cards supported by default are the two graphics cards shown in the initial screenshot. For the sake of testing stability, select N and press Enter.
Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:
1 : metal_intel(r)_uhd_graphics_630.0
2 : metal_amd_radeon_pro_5300m.0
Default device? (1,2)[1]:1
Selected device:
metal_intel(r)_uhd_graphics_630.0
For the default selected settings, set a default device. Here we first set metal_intel®_uhd_graphics_630.0 as the default device. Of course, this device actually has poor performance. Later we will set metal_amd_radeon_pro_5300m.0 as the default device for comparison.
After writing 1, press Enter.
Almost done. Multiplying some matrices...
Tile code:
function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.
Save settings to /Users/tsingj/.plaidml? (y,n)[y]:y
Success!
Press Enter to write the configuration information into the default configuration file to complete the configuration.
In this section, Jupyter is used to run a simple algorithm code and calculate its time.
#!/usr/bin/env python
import numpy as np
import os
import time
import keras
import keras.applications as kapp
from keras.datasets import cifar10
(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
Note that the default keral backend here should use tenserflow, check the output
2024-07-11 14:36:02.753107: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
model = kapp.VGG19()
model.compile(optimizer='sgd', loss='categorical_crossentropy',metrics=['accuracy'])
print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)
Running initial batch (compiling tile program)
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 1s/step
# Now start the clock and run 10 batchesprint("Timing inference...")
start = time.time()
for i in range(10):
y = model.predict(x=x_train, batch_size=batch_size)
print("Ran in {} seconds".format(time.time() - start))
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 891ms/step
Ran in 0.9295139312744141 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 923ms/step
Ran in 1.8894760608673096 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 893ms/step
Ran in 2.818492889404297 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 932ms/step
Ran in 3.7831668853759766 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 892ms/step
Ran in 4.71358585357666 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 860ms/step
Ran in 5.609835863113403 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 878ms/step
Ran in 6.5182459354400635 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 871ms/step
Ran in 7.423128128051758 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 896ms/step
Ran in 8.352543830871582 seconds
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 902ms/step
Ran in 9.288795948028564 seconds
# Importing PlaidML. Make sure you follow this order
import plaidml.keras
plaidml.keras.install_backend()
import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
Note:
1. When using plaidml=0.7.0 version, the plaidml.keras.install_backend() operation will report an error
2. This step will import keras through plaidml and set the background operation engine to plaidml instead of tenserflow
#!/usr/bin/env python
import numpy as np
import os
import time
import keras
import keras.applications as kapp
from keras.datasets import cifar10
(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
model = kapp.VGG19()
Output graphics card information for the first run
INFO:plaidml:Opening device “metal_intel®_uhd_graphics_630.0”
model.compile(optimizer='sgd', loss='categorical_crossentropy',metrics=['accuracy'])
print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)
Running initial batch (compiling tile program)
Since the output is fast, only one line of content is printed.
# Now start the clock and run 10 batchesprint("Timing inference...")
start = time.time()
for i in range(10):
y = model.predict(x=x_train, batch_size=batch_size)
print("Ran in {} seconds".format(time.time() - start))
Ran in 4.241918087005615 seconds
Ran in 8.452141046524048 seconds
Ran in 12.665411949157715 seconds
Ran in 16.849968910217285 seconds
Ran in 21.025720834732056 seconds
Ran in 25.212764024734497 seconds
Ran in 29.405478954315186 seconds
Ran in 33.594977140426636 seconds
Ran in 37.7886438369751 seconds
Ran in 41.98136305809021 seconds
In the plaidml-setup setting, when selecting the graphics card, no longer select the graphics card metal_intel®_uhd_graphics_630.0, but metal_amd_radeon_pro_5300m.0
(venv) tsingj@tsingjdeMacBook-Pro-2 ~ # plaidml-setup
PlaidML Setup (0.6.4)
Thanks for using PlaidML!
Some Notes:
* Bugs and other issues: https://github.com/plaidml/plaidml
* Questions: https://stackoverflow.com/questions/tagged/plaidml
* Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
* PlaidML is licensed under the Apache License 2.0
Default Config Devices:
metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)
Experimental Config Devices:
llvm_cpu.0 : CPU (LLVM)
metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
opencl_amd_radeon_pro_5300m_compute_engine.0 : AMD AMD Radeon Pro 5300M Compute Engine (OpenCL)
opencl_cpu.0 : Intel CPU (OpenCL)
opencl_intel_uhd_graphics_630.0 : Intel Inc. Intel(R) UHD Graphics 630 (OpenCL)
metal_amd_radeon_pro_5300m.0 : AMD Radeon Pro 5300M (Metal)
Using experimental devices can cause poor performance, crashes, and other nastiness.
Enable experimental device support? (y,n)[n]:n
Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:
1 : metal_intel(r)_uhd_graphics_630.0
2 : metal_amd_radeon_pro_5300m.0
Default device? (1,2)[1]:2
Selected device:
metal_amd_radeon_pro_5300m.0
Almost done. Multiplying some matrices...
Tile code:
function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.
Save settings to /Users/tsingj/.plaidml? (y,n)[y]:y
Success!
# Importing PlaidML. Make sure you follow this order
import plaidml.keras
plaidml.keras.install_backend()
import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
Note:
1. When using plaidml=0.7.0 version, the plaidml.keras.install_backend() operation will report an error
2. This step will import keras through plaidml and set the background operation engine to plaidml instead of tenserflow
#!/usr/bin/env python
import numpy as np
import os
import time
import keras
import keras.applications as kapp
from keras.datasets import cifar10
(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
model = kapp.VGG19()
INFO:plaidml:Opening device “metal_amd_radeon_pro_5300m.0”
Note that the graphics card information is entered here for the first execution.
model.compile(optimizer='sgd', loss='categorical_crossentropy',metrics=['accuracy'])
print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)
Running initial batch (compiling tile program)
Since the output is fast, only one line of content is printed.
# Now start the clock and run 10 batchesprint("Timing inference...")
start = time.time()
for i in range(10):
y = model.predict(x=x_train, batch_size=batch_size)
print("Ran in {} seconds".format(time.time() - start))
View the Output
Ran in 0.43606019020080566 seconds
Ran in 0.8583459854125977 seconds
Ran in 1.2787911891937256 seconds
Ran in 1.70143723487854 seconds
Ran in 2.1235032081604004 seconds
Ran in 2.5464580059051514 seconds
Ran in 2.9677979946136475 seconds
Ran in 3.390064001083374 seconds
Ran in 3.8117799758911133 seconds
Ran in 4.236911058425903 seconds
The memory value of the graphics card metal_intel®_uhd_graphics_630.0 is 1536 MB. Although as a graphics card, its performance in computing is not as good as the 6-core CPU of this machine;
The graphics card is metal_amd_radeon_pro_5300m.0, and the memory value is 4G. Its performance is nearly 1 times better than that of the local CPU.
From this we can see the powerful advantages of using GPU in machine learning operations.