sklearn basic tutorial

2024-07-08

Scikit-learn (sklearn) is a popular machine learning library that provides many tools for data mining and data analysis. The following is a simple sklearn basic tutorial that introduces how to perform data preprocessing, model training, and evaluation.

1. Installation and import

First, make sure you have installed the sklearn library. You can install it using pip:

pip install scikit-learn

Importing sklearn is usually done in the following way:

import sklearn from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score

2. Loading the dataset

sklearn contains some built-in standard data sets for our practice and learning. For example, we can load the iris data set:

iris = datasets.load_iris() X = iris.data # 特征数据 y = iris.target # 目标数据

3. Data Preprocessing

Before training the model, data usually needs to be preprocessed, such as standardization, normalization, feature selection, etc.

Standardizing Data：

scaler = StandardScaler() X_scaled = scaler.fit_transform(X)

4. Divide the training set and test set

The dataset is divided into training and test sets, usually using train_test_split function:

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

5. Model selection and training

Choose an appropriate model to train, such as a support vector machine (SVM):

from sklearn.svm import SVC model = SVC(kernel='linear', C=1.0) model.fit(X_train, y_train)

6. Model Evaluation

Use the test set to evaluate the performance of the model. You can use indicators such as accuracy:

y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy}')

7. Parameter Tuning and Cross-Validation

Use cross validation to optimize model parameters:

from sklearn.model_selection import GridSearchCV parameters = {'kernel': ('linear', 'rbf'), 'C': [1, 10]} svc = SVC() clf = GridSearchCV(svc, parameters) clf.fit(X_train, y_train) print(clf.best_params_)

This simple tutorial shows how to use sklearn for basic machine learning tasks. sklearn provides a wealth of tools and algorithms that can be applied to a variety of machine learning problems. The specific application depends on your data and specific task requirements. You can further explore the sklearn documentation and examples for in-depth learning.

Technology Sharing