Technology sharing

Die XXIII Shengsi in castris doctrinarum 25 dierum in castris |

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Praemia vincere potes discendo AI? 30 minuta die, 25 dies ad aperiendum AI Rendu Second Meridian (qq.com)

K, proximus algorithmus instrumenti rubri racemi

Hoc experimentum maxime usum MindSporis introducit ad experimenta KNN in aliquo vino dato inducens.

1. Experimentum ad

  • Primas notiones intellige KNN;
  • Disce quomodo experimenta KNN ducere MindSpore utens.

2. Introductio ad principium K algorithmum proximum

K-Narest-Vicinus (KNN) est methodus statistica parametrica pro classificatione et regressione, primum a Cover et Hart, anno 1968 proposita.Cover et al., 1967 ), est algorithmorum fundamentalium in machina discendi. Fundatur in superiori idea: genus exempli definire, distantiam ab omni disciplinarum exemplorum computare potes, deinde exempla k proximis specimen invenire, genera horum exemplorum computa, ac suffragium plurimae suffragiorum genus is the result of classification. Tria elementa fundamentalia KNN;

  • K valorem, classificatio exempli determinatur per "majorem suffragium" proximorum K. Quoque minor est K aestimatio, eo facilius sonitu afficitur. Quin potius circumscriptiones inter genera turbabuntur.

  • Distantia mensurae similitudinem inter duo exempla in pluma spatium reflectit. Vulgo adhibitae sunt distantiae Lp (cum p=2, est spatium Euclideum), Manhattan spatium, Hamming intervallum, etc.

  • Classificationis decisionum regulae, plerumque maioritas suffragationis, vel maioritas suffragiorum in distantia ponderis innixa (pondera sunt inverse proportionalia distantiae).

Rationes statisticae non-parametricae ad methodos statisticas spectant, quae in formis parametricis non nituntur (qualis est normalis distributio suppositionis in statisticis traditis, etc.), hoc est, non faciunt principia stricte de data distributione. Comparati cum methodis statisticis parametricis (ut regressio linearis, t-test, etc.), statisticae non-parametricae magis flexibiles sunt, quia non assumunt, quod notitia certae distributionis exemplar consequitur.

Modi statisticae parametrici nituntur in certis suppositis et requisitis ad distributionem datam faciendam, et utimur quibusdam parametris ad describendam figuram distributionis notitiarum. Hae formae parametricae signanter simpliciorem reddere possunt exempla et analyses, sed etiam indigent suppositionibus et parametris ut cum ipsa notitia consentanea sint, quae concludere possunt ad seductionem cum suppositionibus non occurrentibus.
Linearibus 1. Regressionem:
Assumptio: Est relatio linearis inter variabilis dependens (variabilis responsio) et independens variabilis (variabilis explicationis).
Parametri: Regressio coefficientium (clivi et intercipiendi), fere supponens terminum erroris nihilum habere medium, constantem dissidentiam (homoskedasticity) ac distributionem normalem.
2. Regressio Logistic:
Assumptio: Dependens variabilis (variabilis categorica) exemplar regressionis logisticae satisfacit, et regressio coefficiens valorem certum est.
Parameter: Regressio coefficiens, influentiam variabilium independentium variabilium dependens, maximam aestimationem verisimilis aestimavit.
3. t-test;
Assumptio: The sample data ex normaliter distributa multitudo.
Parametri: medium et variatum, unum exemplum medium incolarum sumit, duo-sempla experimentum sumit differentiam inter mediorum duarum incolarum.

 

2.1 Ordo problema

Praenuntiationis processus algorithmus (classificationis) talis est:

(1) Reperio exempla k proxima ad specimen probatum x_test in forma formativa statuta et salva ea in N statuto;

(2)统计集合N中每一类样本的个数𝐶𝑖,𝑖=1,2,3,...,𝑐;

(3)最终的分类结果为argmax𝐶𝑖 (最大的对应的𝐶𝑖)那个类。

In processu exsecutione supra, valor ipsius k magni momenti est. Determinari potest secundum problematum notas et notas. In exsecutione specifica, pondus exempli considerari potest, hoc est, unumquodque specimen aliud pondus suffragii habet. Haec methodus vocatur algorithm gravem proximum k-proximum, quod est variatio proximi algorithmi k-proximi.

2.2 Regressio problema

假设离测试样本最近的k个训练样本的标签值为𝑦𝑖,则对样本的回归预测输出值为:

Istud est medium valorem pittacii omnium vicinorum.

Regressio praedictionis munus cum ponderibus sample est:

其中𝑤𝑖为第个𝑖样本的权重。

2.3 Definitio distantiae

KNN算法的实现依赖于样本之间的距离,其中最常用的距离函数就是欧氏距离(欧几里得距离)。ℝ𝑛空间中的两点𝑥和𝑦,它们之间的欧氏距离定义为:

Animadvertendum est quod, cum distantiae Euclideae utens, unumquodque elementum vectoris lineae normalizari debet ad impedimentum reducendum quod variae scalae elatae lineae liniamenta valorum sunt values.

Aliae rationes distantiae calculi includit Mahalanobis distantiam, distantiam Bhattacharyya, etc.

3. Experimentalis environment

Praevia cognitio;

  • Pythone utens proficit.
  • Habetis certa scientia apparatus doctrinarum theoriae, ut KNN, doctrina invisus, Euclidea procul, etc.

lab ambitu:

  • MindSpore 2.0 (versio MindSpore regulariter renovabitur, et hic dux etiam regulariter ut versioni congruit);
  • Hic casus systemata win_x86 et Linux sustinet et in CPU/GPU/Ascendere potest.
  • Si hoc experimentum localiter curris, quaeso ad "MindSpore Environment Setup Experimentum Manual" ut MindSpore localiter instituat.

4. MGE

4.1 Data praeparatio

Data copia vini est unum ex clarissimis notitias pro exemplaris recognitione.Vinum Data Set . Notitia analysis chemicae vinorum ex eadem regione Italiae proveniunt, sed ex tribus variis generibus. Data copia analyses summas 13 ingredientium in singulis vinis tribus contentas.Haec XIII attributa sunt

  1. Sin
  2. Acidum malicum, acidum malicum
  3. Cinis, griseo
  4. Alcalnity of cinis, alcalinitas cinerum
  5. Magnesium
  6. Total phenols, total phenols
  7. Flavanoids, flavonoids
  8. phenols Nonflavanoideis, phenols nonflavanoideis
  9. Proanthocyanins, proanthocyanidins
  10. Color intensio, color intensio
  11. Hue
  12. OD280/OD315 vini diluti, OD280/OD315 vini diluti
  13. Proline, proline
ClavisPreciumClavisPrecium
Data Set Characteres:MultivariateNumerus instantiarum:178
Characteres attributi:Integer, RealNumerus attributorum:13
Negotium Associatur:ClassificationAbsentis Pretio?No
  1. %%capture captured_output
  2. # 实验环境已经预装了mindspore==2.2.14,如需更换mindspore版本,可更改下面mindspore的版本号
  3. !pip uninstall mindspore -y
  4. !pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
  1. # 查看当前 mindspore 版本
  2. !pip show mindspore
Name: mindspore
Version: 2.2.14
Summary: MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Home-page: https://www.mindspore.cn
Author: The MindSpore Authors
Author-email: [email protected]
License: Apache 2.0
Location: /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages
Requires: asttokens, astunparse, numpy, packaging, pillow, protobuf, psutil, scipy
Required-by: 
  1. from download import download
  2. # 下载红酒数据集
  3. url = "https://ascend-professional-construction-dataset.obs.cn-north-4.myhuaweicloud.com:443/MachineLearning/wine.zip"
  4. path = download(url, "./", kind="zip", replace=True)
Downloading data from https://ascend-professional-construction-dataset.obs.cn-north-4.myhuaweicloud.com:443/MachineLearning/wine.zip (4 kB)

file_sizes: 100%|██████████████████████████| 4.09k/4.09k [00:00<00:00, 2.51MB/s]
Extracting zip file...
Successfully downloaded / unzipped to ./

4.2 Data Lectio et dispensando

Inferre MindSpore moduli et auxilia moduli

Antequam generare notitias, Python bibliothecas requiri importat.

In bibliotheca os currently adhibetur.

Detailed descriptiones moduli MindSpore investigabiles possunt in pagina MindSpore API.

Configurare potes informationes ad operandum requisitas per context.set_contextum, ut modus operationis, notitia tergum, hardware et alia informationes.

Contextum moduli importe et informationes ad operandum requisitas configurare.

  1. %matplotlib inline
  2. import os
  3. import csv
  4. import numpy as np
  5. import matplotlib.pyplot as plt
  6. import mindspore as ms
  7. from mindspore import nn, ops
  8. ms.set_context(device_target="CPU")
refer admindspore.set_context ,device_target " (str) - Repraesentat scopum machinam currere, sustinens 'Ascend', 'GPU' et 'CPU'. Si modulus hic non positus est, posterior fabrica respondens in sarcina MindSpore adhibetur. Hoc etiam apponi potest ad Ascendendum in ambitu experimentali.

Legere vinum datasetwine.dataac nonnulla.
  1. with open('wine.data') as csv_file:
  2. data = list(csv.reader(csv_file, delimiter=','))
  3. print(data[56:62]+data[130:133])
[['1', '14.22', '1.7', '2.3', '16.3', '118', '3.2', '3', '.26', '2.03', '6.38', '.94', '3.31', '970'], ['1', '13.29', '1.97', '2.68', '16.8', '102', '3', '3.23', '.31', '1.66', '6', '1.07', '2.84', '1270'], ['1', '13.72', '1.43', '2.5', '16.7', '108', '3.4', '3.67', '.19', '2.04', '6.8', '.89', '2.87', '1285'], ['2', '12.37', '.94', '1.36', '10.6', '88', '1.98', '.57', '.28', '.42', '1.95', '1.05', '1.82', '520'], ['2', '12.33', '1.1', '2.28', '16', '101', '2.05', '1.09', '.63', '.41', '3.27', '1.25', '1.67', '680'], ['2', '12.64', '1.36', '2.02', '16.8', '100', '2.02', '1.41', '.53', '.62', '5.75', '.98', '1.59', '450'], ['3', '12.86', '1.35', '2.32', '18', '122', '1.51', '1.25', '.21', '.94', '4.1', '.76', '1.29', '630'], ['3', '12.88', '2.99', '2.4', '20', '104', '1.3', '1.22', '.24', '.83', '5.4', '.74', '1.42', '530'], ['3', '12.81', '2.31', '2.4', '24', '98', '1.15', '1.09', '.27', '.83', '5.7', '.66', '1.36', '560']]
取三类样本(共178条),将数据集的13个属性作为自变量𝑋。将数据集的3个类别作为因变量𝑌。
  1. X = np.array([[float(x) for x in s[1:]] for s in data[:178]], np.float32)
  2. Y = np.array([s[0] for s in data[:178]], np.int32)
Accipe duo attributa exempli pro 2 dimensionis visualizationis, et potes videre distributionem et separabilitatem specimen in duobus attributis.
  1. attrs = ['Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash', 'Magnesium', 'Total phenols',
  2. 'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins', 'Color intensity', 'Hue',
  3. 'OD280/OD315 of diluted wines', 'Proline']
  4. plt.figure(figsize=(10, 8))
  5. for i in range(0, 4):
  6. plt.subplot(2, 2, i+1)
  7. a1, a2 = 2 * i, 2 * i + 1
  8. plt.scatter(X[:59, a1], X[:59, a2], label='1')
  9. plt.scatter(X[59:130, a1], X[59:130, a2], label='2')
  10. plt.scatter(X[130:, a1], X[130:, a2], label='3')
  11. plt.xlabel(attrs[a1])
  12. plt.ylabel(attrs[a2])
  13. plt.legend()
  14. plt.show()

Ex codice aestimatur prima 59 exemplaria ad categoriam 1, 59 ad 130 pertinere posse, ad categoriam 2 pertinere, reliqua exemplaria ad categoriam 3 pertinere possunt.

Ex graphice ostentatione, duo effectus classificationis attributi in angulo sinistro superiore et dextro graphi angulo dextro inferiores sunt relativi boni, praesertim termini inter genera 1 et 2 relativorum manifesti sunt.

Data copia in institutionem (specimina praedicamentorum notarum) divisa est et in radice (emplaria verificanda) ad 128:50;
  1. train_idx = np.random.choice(178, 128, replace=False)
  2. test_idx = np.array(list(set(range(178)) - set(train_idx)))
  3. X_train, Y_train = X[train_idx], Y[train_idx]
  4. X_test, Y_test = X[test_idx], Y[test_idx]

分解test_idx = np.array(list(set(range (178)) - set(train_idx)))
1. range (178) generat ordinem integrorum ab 0 ad 177.
2. paro(range (178)) hanc seriem integrorum in statuto convertit. Copia data est structura quae crebris elementis sustinet nec operationes constituit efficientes (ut unionem, intersectionem, differentiam etc.).
3. paro(train_idx) convertens passim electa formatio index indices train_idx in statuto.
4. paro(range (178)) - set (train_idx) differentiam determinati operationis computat et obtinet elementa quae in statuto(178)) sed non in statuto (train_idx), quae sunt indices test posuit. Operatio haec est, quae cito et efficaciter differentiam elementorum computare potest.
5. album(...) eventum differentiae operationis in album vertit.
6. np.array(...) hunc album in numpy vertit ad indexing cum numpy vestit X et Y

 

5. Model constructione - calculate procul

Abutendumque quod MindSpore offerstile, square, ReduceSum, sqrt, TopKet alii operatores simul computant distantiam inter input specimen x et alia exemplaria X_train per matrix operationes distincte classificatae et summo k proximis proximis computant.

  1. class KnnNet(nn.Cell):
  2. def __init__(self, k):
  3. super(KnnNet, self).__init__()
  4. self.k = k
  5. def construct(self, x, X_train):
  6. #平铺输入x以匹配X_train中的样本数
  7. x_tile = ops.tile(x, (128, 1))
  8. square_diff = ops.square(x_tile - X_train)
  9. square_dist = ops.sum(square_diff, 1)
  10. dist = ops.sqrt(square_dist)
  11. #-dist表示值越大,样本就越接近
  12. values, indices = ops.topk(-dist, self.k)
  13. return indices
  14. def knn(knn_net, x, X_train, Y_train):
  15. x, X_train = ms.Tensor(x), ms.Tensor(X_train)
  16. indices = knn_net(x, X_train)
  17. topk_cls = [0]*len(indices.asnumpy())
  18. for idx in indices.asnumpy():
  19. topk_cls[Y_train[idx]] += 1
  20. cls = np.argmax(topk_cls)
  21. return cls

1. Definire KNN network genus: KnnNet
__init__ function: Initialize retis et pone valorem K (id est numerus proximi proximi delecti).
functionem construe: Euclidea computa distantiam inter specimen x input et unumquodque specimen in disciplina X_train positum, et redde indicem exemplorum k (id est vicini proximi) minimo intervallo.
2. Definire munus KNN: knn
Knn_net (KNN retis exempli gratia), test sample x, formatio exempli X_train et institutio label Y_train sicut input.
Utere knn_net ad exempla proxima k proxima ipsius x invenire, ea secundum pittacia exemplarium indica, et proventum classificationem redde (i.e. praedicationem praedicatam).

 

6. Model vaticinium

在验证集上验证KNN算法的有效性,取𝑘=5,验证精度接近80%,说明KNN算法在该3分类任务上有效,能根据酒的13种属性判断出酒的品种。

  1. acc = 0
  2. knn_net = KnnNet(5)
  3. for x, y in zip(X_test, Y_test):
  4. pred = knn(knn_net, x, X_train, Y_train)
  5. acc += (pred == y)
  6. print('label: %d, prediction: %s' % (y, pred))
  7. print('Validation accuracy is %f' % (acc/len(Y_test)))

Calculare et imprimere subtiliter test paro
Lope per singulas specimen in testi X_test et Y_test, et uti knn functioni ad singula exempla referenda.
Statistics numerum exemplorum rectarum praedicunt, subtiliter classificationis computant et output.

label: 1, prediction: 1
label: 2, prediction: 3
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 2
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 3
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 2
label: 1, prediction: 2
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 2
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 3
label: 3, prediction: 1
label: 3, prediction: 2
label: 3, prediction: 2
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 2
label: 1, prediction: 3
label: 3, prediction: 1
label: 3, prediction: 3
label: 3, prediction: 1
label: 3, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 3
label: 2, prediction: 3
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 1
label: 2, prediction: 3
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
Validation accuracy is 0.660000

Fieri non potest ut omni tempore 80% consequi possit.

label: 1, prediction: 1
label: 3, prediction: 3
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 3
label: 3, prediction: 3
label: 3, prediction: 2
label: 3, prediction: 3
label: 1, prediction: 1
label: 3, prediction: 3
label: 1, prediction: 1
label: 1, prediction: 3
label: 3, prediction: 3
label: 1, prediction: 3
label: 1, prediction: 1
label: 3, prediction: 3
label: 1, prediction: 1
label: 1, prediction: 1
label: 3, prediction: 3
label: 3, prediction: 3
label: 3, prediction: 2
label: 1, prediction: 1
label: 3, prediction: 3
label: 1, prediction: 1
label: 3, prediction: 3
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 1, prediction: 1
label: 2, prediction: 3
label: 2, prediction: 1
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 3
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 3
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 2
label: 2, prediction: 3
label: 2, prediction: 2
Validation accuracy is 0.820000

Experimentum summary

Hoc experimentum MindSpore utitur ad algorithmum KNN efficiendum ad problema III classificationis solvendum. Sume 3 genera speciminum in vino datos et in notis categoriis exempla et exempla ut verificanda divide. Vini attributa XIII.

Praeparatio data: data e Vino dataset rutrum vel Huawei Cloud OBS, lege et processum est.

MGE: Data commoda in variabiles independens (13 attributa) et variabilium dependentium (3 genera), divide et visualise ut distributio specimen observet.

Exemplar constructionis: Definire structuram retis KNN, auctoris uti ab MindSpore proviso computare distantiam inter specimen input specimen et formationem disciplinae, et proximos k proximos invenire.

Exemplar praedictum: Fac praedictiones in radice convalidationis et accurate praedictionem computa.

Eventus experimentales ostendunt accurationem algorithmi KNN in notitia copiae Vini classificationem prope 80% (66%).