2024-07-08
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Scikit-learn (sklearn) is a popular machine learning library that provides many tools for data mining and data analysis. The following is a simple sklearn basic tutorial that introduces how to perform data preprocessing, model training, and evaluation.
First, make sure you have installed the sklearn library. You can install it using pip:
pip install scikit-learn
Importing sklearn is usually done in the following way:
import sklearn from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score
sklearn contains some built-in standard data sets for our practice and learning. For example, we can load the iris data set:
iris = datasets.load_iris() X = iris.data # 特征数据 y = iris.target # 目标数据
Before training the model, data usually needs to be preprocessed, such as standardization, normalization, feature selection, etc.
Standardizing Data:
scaler = StandardScaler() X_scaled = scaler.fit_transform(X)
The dataset is divided into training and test sets, usually using train_test_split
function:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
Choose an appropriate model to train, such as a support vector machine (SVM):
from sklearn.svm import SVC model = SVC(kernel='linear', C=1.0) model.fit(X_train, y_train)
Use the test set to evaluate the performance of the model. You can use indicators such as accuracy:
y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy}')
Use cross validation to optimize model parameters:
from sklearn.model_selection import GridSearchCV parameters = {'kernel': ('linear', 'rbf'), 'C': [1, 10]} svc = SVC() clf = GridSearchCV(svc, parameters) clf.fit(X_train, y_train) print(clf.best_params_)
This simple tutorial shows how to use sklearn for basic machine learning tasks. sklearn provides a wealth of tools and algorithms that can be applied to a variety of machine learning problems. The specific application depends on your data and specific task requirements. You can further explore the sklearn documentation and examples for in-depth learning.