2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Machina vector (SVM) valida et versatilis est machina discendi exemplar idoneum ad classificationem, regressionem, et deprehensionem exteriorem pro lineari et nonlineari. Articulus hic subsidium vectoris machinae algorithmi et eius exsecutionem in scikit-discendo introducet, et breviter explorandum principalem analysin componentem et eius applicationem in scikit-discendo.
Support vectoris machinae (SVM) algorithmus late usus est in agro machinae eruditionis, favens pro sua proprietate significans accurate comparandi facultates minus computando. SVM adhiberi potest ad negotia tum classificationem et regressionem, sed late in quaestionibus classificationis adhibetur.
Propositum subsidii vector machinis est NNN dimensiva ( NNN est numerus linearum) invenire hyperplanum qui notitias puncta clare distinguere potest. Haec hyperplane dat puncta diversa genera separanda et quantum fieri potest a hyperplane, ut validitas classificationis invigilet.
Ad efficaciorem separationem notitiarum punctorum, plures hyperplanes existere possunt. Propositum est hyperplanem cum margine maximo eligere, i.e., maximam distantiam inter duas classes. Maximising margines accurationem classificationis melioris adiuvat.
Hyperplana est terminus decisionis qui puncta data dividit. Puncta data in utraque parte hyperplani posita in varia genera collocari possunt. Hyperplane dimensiones a numero linearum pendent: si lineamenta input II sunt, hyperplanum est linea recta; Cum numerus linearum 3 superat, hyperplane difficilis fit intellectui intuenti.
Vectores adiuvant ad ea puncta hyperplani proxima, quae situm et directionem hyperplani afficiunt. His vectoribus sustentandis, marginem classificantis augere possumus. Vectores removentes locum mutat hyperplani, ideo cruciales sunt ad aedificandum SVM.
In regressione logistica, munus sigmoidea utimur comprimere output valorem functionis linearis in ambitum [0,1] et pittacia in limine innixa assignare (0.5). In SVM, valore functionis linearis ad definiendum in SVM utimur: si output maior est quam 1, unius classis convenit; SVM range marginales format [-1,1] ponendo limen output valorem ad 1 et -1.
Praenuntians diagnoses cancros benignos et malignos utens machinis vectoris sustentantibus.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_style('whitegrid')
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
# 创建 DataFrame
col_names = list(cancer.feature_names)
col_names.append('target')
df = pd.DataFrame(np.c_[cancer.data, cancer.target], columns=col_names)
df.head()
df.info()print(cancer.target_names)
# ['malignant', 'benign']
# 数据描述:
df.describe()
# 统计摘要:
df.info()
sns.pairplot(df, hue='target', vars=[
'mean radius', 'mean texture', 'mean perimeter', 'mean area',
'mean smoothness', 'mean compactness', 'mean concavity',
'mean concave points', 'mean symmetry', 'mean fractal dimension'
])
sns.countplot(x=df['target'], label="Count")
plt.figure(figsize=(10, 8))
sns.scatterplot(x='mean area', y='mean smoothness', hue='target', data=df)
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(), annot=True)
In apparatus eruditionis exemplar disciplina est gradus criticus in solutionibus problematum inveniendis.Infra inducemus quomodo utendumscikit-learn
Exsequere exemplar disciplinae et monstrare effectum machinae vectoris sustentationis (SVM) sub diversis nucleis.
Primum opus est data preprocess parare. Sequens signum est exemplum notitiae preprocessing:
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
X = df.drop('target', axis=1)
y = df.target
print(f"'X' shape: {X.shape}")
print(f"'y' shape: {y.shape}")
# 'X' shape: (569, 30)
# 'y' shape: (569,)
pipeline = Pipeline([
('min_max_scaler', MinMaxScaler()),
('std_scaler', StandardScaler())
])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
In codice utimur MinMaxScaler
etStandardScaler
Data scande. Data dividitur in exercitationes et probationes, cum 30% notitiarum usus ad probationem.
Ad exemplar faciendum aestimare, definimus a print_score
munus, quod potest subtilitatem, classificationem, famam et confusionem matricis institutionis et eventus exponere;
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import pandas as pd
def print_score(clf, X_train, y_train, X_test, y_test, train=True):
if train:
pred = clf.predict(X_train)
clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True))
print("Train Result:n================================================")
print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: n {confusion_matrix(y_train, pred)}n")
else:
pred = clf.predict(X_test)
clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True))
print("Test Result:n================================================")
print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: n {confusion_matrix(y_test, pred)}n")
Support vectoris machina (SVM) potens est algorithmus classificatio cuius effectus ab hyperparametris afficitur. Sequentes ambitum principalem SVM introducent et eorum impulsum ad exemplar faciendum:
'poly'
) neglecta ab aliis nucleis. Valores hyperparametri optimales per inquisitionem eget invenientur.nucleus linearis SVM pluribus adiunctis aptus est, praesertim cum notitia copia magnum numerum linearum habet. Sequens exemplum nuclei SVM linearis est codicis usus;
from sklearn.svm import LinearSVC
model = LinearSVC(loss='hinge', dual=True)
model.fit(X_train, y_train)
print_score(model, X_train, y_train, X_test, y_test, train=True)
print_score(model, X_train, y_train, X_test, y_test, train=False)
Disciplina et probatio eventus sunt hae:
Proventus Training:
Accuracy Score: 86.18%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 1.000000 0.819079 0.861809 0.909539 0.886811
recall 0.630872 1.000000 0.861809 0.815436 0.861809
f1-score 0.773663 0.900542 0.861809 0.837103 0.853042
support 149.000000 249.000000 0.861809 398.000000 398.000000
_______________________________________________
Confusion Matrix:
[[ 94 55]
[ 0 249]]
Proventus test:
Accuracy Score: 89.47%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 1.000000 0.857143 0.894737 0.928571 0.909774
recall 0.714286 1.000000 0.894737 0.857143 0.894737
f1-score 0.833333 0.923077 0.894737 0.878205 0.890013
support 63.000000 108.000000 0.894737 171.000000 171.000000
_______________________________________________
Confusion Matrix:
[[ 45 18]
[ 0 108]]
Acinum polynomiale SVM aptum est ad notitias nonlineares. Sequens est exemplum codicis nuclei polynomiae secundae utentis;
from sklearn.svm import SVC
model = SVC(kernel='poly', degree=2, gamma='auto', coef0=1, C=5)
model.fit(X_train, y_train)
print_score(model, X_train, y_train, X_test, y_test, train=True)
print_score(model, X_train, y_train, X_test, y_test, train=False)
Disciplina et probatio eventus sunt hae:
Proventus Training:
Accuracy Score: 96.98%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.985816 0.961089 0.969849 0.973453 0.970346
recall 0.932886 0.991968 0.969849 0.962427 0.969849
f1-score 0.958621 0.976285 0.969849 0.967453 0.969672
support 149.000000 249.000000 0.969849 398.000000 398.000000
_______________________________________________
Confusion Matrix:
[[139 10]
[ 2 247]]
Proventus test:
Accuracy Score: 97.08%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.967742 0.972477 0.97076 0.970109 0.970733
recall 0.952381 0.981481 0.97076 0.966931 0.970760
f1-score 0.960000 0.976959 0.97076 0.968479 0.970711
support 63.000000 108.000000 0.97076 171.000000 171.000000
_______________________________________________
Confusion Matrix:
[[ 60 3]
[ 2 106]]
Munus basis radialis (RBF) nuclei apta sunt ad data nonlinea expedienda. Hoc exemplum in codice RBF core utens est:
model = SVC(kernel='rbf', gamma=0.5, C=0.1)
model.fit(X_train, y_train)
print_score(model, X_train, y_train, X_test, y_test, train=True)
print_score(model, X_train, y_train, X_test, y_test, train=False)
Disciplina et probatio eventus sunt hae:
Proventus Training:
Accuracy Score: 62.56%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.0 0.625628 0.625628 0.312814 0.392314
recall 0.0 1.000000 0.625628 0.500000 0.625628
f1-score 0.0 0.769231 0.625628 0.384615 0.615385
support 149.0 249.0 0.625628 398.0 398.0
_______________________________________________
Confusion Matrix:
[[ 0 149]
[ 0 249]]
Proventus test:
Accuracy Score: 64.97%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.0 0.655172 0.649661 0.327586 0.409551
recall 0.0 1.000000 0.649661 0.500000 0.649661
f1-score 0.0 0.792453 0.649661 0.396226 0.628252
support 63.0 108.0 0.649661 171.0 171.0
_______________________________________________
Confusion Matrix:
[[ 0 63]
[ 0 108]]
Per exemplar supra disciplinae et aestimationis processum, videre possumus differentias diversorum acinum SVM perficiendi. nucleus linearis SVM bene facit secundum tempus accurationis et disciplinae, et apta ad condiciones cum altioribus dimensionibus datarum. nucleus polynomialis SVM et RBF nucleus SVM meliores effectus in notitia nonlineari habent, sed ad certas moduli unctiones superfluentes ducere possunt. Apta nucleos et hyperparametros eligendo pendet ad exemplar faciendum melius emendandum.
digital input : SVM supponit quod input data sit numerorum. Si input data variabiles categoricas sunt, necesse est eas in binas phantasmatas variabiles convertere (unum per genus variabile).
binarii divisio :Basic SVM apta classificationibus binariis quaestionibus est. Etsi SVM maxime adhibeatur pro classificatione binaria, versiones quoque extensae sunt ad regressionem et classificationem multiformem.
X_train = pipeline.fit_transform(X_train)
X_test = pipeline.transform(X_test)
Sequentia ostendit formationem et probationem eventuum nucleorum diversorum SVM;
Linearis Kernel SVM
print("=======================Linear Kernel SVM==========================")
model = SVC(kernel='linear')
model.fit(X_train, y_train)
print_score(model, X_train, y_train, X_test, y_test, train=True)
print_score(model, X_train, y_train, X_test, y_test, train=False)
Proventus Training:
Accuracy Score: 98.99%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 1.000000 0.984190 0.98995 0.992095 0.990109
recall 0.973154 1.000000 0.98995 0.986577 0.989950
f1-score 0.986395 0.992032 0.98995 0.989213 0.989921
support 149.000000 249.000000 0.98995 398.000000 398.000000
_______________________________________________
Confusion Matrix:
[[145 4]
[ 0 249]]
测试结果
Accuracy Score: 97.66%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.968254 0.981481 0.976608 0.974868 0.976608
recall 0.968254 0.981481 0.976608 0.974868 0.976608
f1-score 0.968254 0.981481 0.976608 0.974868 0.976608
support 63.000000 108.000000 0.976608 171.000000 171.000000
_______________________________________________
Confusion Matrix:
[[ 61 2]
[ 2 106]]
print("=======================Polynomial Kernel SVM==========================")
from sklearn.svm import SVC
model = SVC(kernel='poly', degree=2, gamma='auto')
model.fit(X_train, y_train)
print_score(model, X_train, y_train, X_test, y_test, train=True)
print_score(model, X_train, y_train, X_test, y_test, train=False)
Proventus Training:
Accuracy Score: 85.18%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.978723 0.812500 0.851759 0.895612 0.874729
recall 0.617450 0.991968 0.851759 0.804709 0.851759
f1-score 0.757202 0.893309 0.851759 0.825255 0.842354
support 149.000000 249.000000 0.851759 398.000000 398.000000
_______________________________________________
Confusion Matrix:
[[ 92 57]
[ 2 247]]
测试结果:
Accuracy Score: 82.46%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.923077 0.795455 0.824561 0.859266 0.842473
recall 0.571429 0.972222 0.824561 0.771825 0.824561
f1-score 0.705882 0.875000 0.824561 0.790441 0.812693
support 63.000000 108.000000 0.824561 171.000000 171.000000
_______________________________________________
Confusion Matrix:
[[ 36 27]
[ 3 105]]
print("=======================Radial Kernel SVM==========================")
from sklearn.svm import SVC
model = SVC(kernel='rbf', gamma=1)
model.fit(X_train, y_train)
print_score(model, X_train, y_train, X_test, y_test, train=True)
print_score(model, X_train, y_train, X_test, y_test, train=False)
Proventus Training:
Accuracy Score: 100.00%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 1.0 1.0 1.0 1.0 1.0
recall 1.0 1.0 1.0 1.0 1.0
f1-score 1.0 1.0 1.0 1.0 1.0
support 149.0 249.0 1.0 398.0 398.0
_______________________________________________
Confusion Matrix:
[[149 0]
[ 0 249]]
测试结果:
Accuracy Score: 63.74%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 1.000000 0.635294 0.637427 0.817647 0.769659
recall 0.015873 1.000000 0.637427 0.507937 0.637427
f1-score 0.031250 0.776978 0.637427 0.404114 0.502236
support 63.000000 108.000000 0.637427 171.000000 171.000000
_______________________________________________
Confusion Matrix:
[[ 1 62]
[ 0 108]]
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.01, 0.1, 0.5, 1, 10, 100],
'gamma': [1, 0.75, 0.5, 0.25, 0.1, 0.01, 0.001],
'kernel': ['rbf', 'poly', 'linear']}
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=1, cv=5)
grid.fit(X_train, y_train)
best_params = grid.best_params_
print(f"Best params: {best_params}")
svm_clf = SVC(**best_params)
svm_clf.fit(X_train, y_train)
print_score(svm_clf, X_train, y_train, X_test, y_test, train=True)
print_score(svm_clf, X_train, y_train, X_test, y_test, train=False)
Proventus Training:
Accuracy Score: 98.24%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.986301 0.980159 0.982412 0.983230 0.982458
recall 0.966443 0.991968 0.982412 0.979205 0.982412
f1-score 0.976271 0.986028 0.982412 0.981150 0.982375
support 149.000000 249.000000 0.982412 398.000000 398.000000
_______________________________________________
Confusion Matrix:
[[144 5]
[ 2 247]]
测试结果:
Accuracy Score: 98.25%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.983871 0.981651 0.982456 0.982761 0.982469
recall 0.968254 0.990741 0.982456 0.979497 0.982456
f1-score 0.976000 0.986175 0.982456 0.981088 0.982426
support 63.000000 108.000000 0.982456 171.000000 171.000000
_______________________________________________
Confusion Matrix:
[[ 61 2]
[ 1 107]]
Analysis principalis componentis (PCA) est ars quae reductionem dimensionalitatem linearem attingit, notitias in spatium inferiorem dimensiva eminens.
Cum alta notitia dimensiva directe ad visualizare difficile est, PCA uti possumus prima duo principalia elementa invenire et notitias in spatio duos dimensionis visualizare. Ad hoc assequendum notitias primum normalizandas esse oportet ut variatio uniuscuiusque notae sit unitas variatio.
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# 数据标准化
scaler = StandardScaler()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# PCA 降维
pca = PCA(n_components=2)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
# 可视化前两个主成分
plt.figure(figsize=(8,6))
plt.scatter(X_train[:,0], X_train[:,1], c=y_train, cmap='plasma')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.show()
Per prima duo principalia elementa facile distinguere possumus diversa genera notitiarum punctorum in spatio duorum dimensionum.
Etsi extensio dimensiva potens est, significatio partium difficile est directe intelligere. Unaquaeque pars correspondet complexioni notarum originalium, quae apto obiecto PCA obtineri possunt.
Component proprietates cognatae include:
Cum vectoris machinis subsidiis adhibitis (SVM) ad exemplar disciplinae adhibitis, hyperparametris aptandis opus est ad optimum exemplar obtinendum. Hoc exemplum est exemplum codicis ad SVM parametris aptandis quaerendis quaerendis (GridSearchCV);
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
# 定义参数网格
param_grid = {'C': [0.01, 0.1, 0.5, 1, 10, 100],
'gamma': [1, 0.75, 0.5, 0.25, 0.1, 0.01, 0.001],
'kernel': ['rbf', 'poly', 'linear']}
# 网格搜索
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=1, cv=5)
grid.fit(X_train, y_train)
best_params = grid.best_params_
print(f"Best params: {best_params}")
# 使用最佳参数训练模型
svm_clf = SVC(**best_params)
svm_clf.fit(X_train, y_train)
Proventus Training:
Accuracy Score: 96.48%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.978723 0.957198 0.964824 0.967961 0.965257
recall 0.926174 0.987952 0.964824 0.957063 0.964824
f1-score 0.951724 0.972332 0.964824 0.962028 0.964617
support 149.000000 249.000000 0.964824 398.000000 398.000000
_______________________________________________
Confusion Matrix:
[[138 11]
[ 3 246]]
测试结果:
Accuracy Score: 96.49%
_______________________________________________
CLASSIFICATION REPORT:
0.0 1.0 accuracy macro avg weighted avg
precision 0.967213 0.963636 0.964912 0.965425 0.964954
recall 0.936508 0.981481 0.964912 0.958995 0.964912
f1-score 0.951613 0.972477 0.964912 0.962045 0.964790
support 63.000000 108.000000 0.964912 171.000000 171.000000
_______________________________________________
Confusion Matrix:
[[ 59 4]
[ 2 106]]
In hoc capitulo sequentia didicimus:
refer ad:Support Vector Machine & PCA Tutorial for Beginner
Suadeo mihi cognatas columnas;