AI-powered search & chat for Data / Computer Science Students
8.1. Strategies to scale computationally: bigger data
For some applications the amount of examples, features (or both) and/or the speed at which they need to be processed are challenging for traditional approaches. In these cases scikit-learn has a nu......
Read more at Scikit-learn User Guide6.2. Feature extraction
The sklearn.feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image. Loading featur......
Read more at Scikit-learn User Guide1.13. Feature selection
The classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators’ accuracy scores or to boost their perfor......
Read more at Scikit-learn User Guide5. Visualizations
Scikit-learn defines a simple API for creating visualizations for machine learning. The key feature of this API is to allow for quick plotting and visual adjustments without recalculation. We provi......
Read more at Scikit-learn User Guide1.4. Support Vector Machines
Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection. The advantages of support vector machines are: Effective in high ......
Read more at Scikit-learn User Guide1.10. Decision Trees
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning s......
Read more at Scikit-learn User Guide7.3. Generated datasets
In addition, scikit-learn includes various random sample generators that can be used to build artificial datasets of controlled size and complexity. Generators for classification and clustering: Th......
Read more at Scikit-learn User Guide10. Common pitfalls and recommended practices
The purpose of this chapter is to illustrate some common pitfalls and anti-patterns that occur when using scikit-learn. It provides examples of what not to do, along with a corresponding correct ex......
Read more at Scikit-learn User Guide7.1. Toy datasets
scikit-learn comes with a few small standard datasets that do not require to download any file from some external website. They can be loaded using the following functions: These datasets are usefu......
Read more at Scikit-learn User Guide6.3. Preprocessing data
The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream esti......
Read more at Scikit-learn User Guide7.4. Loading other datasets
Sample images: Scikit-learn also embeds a couple of sample JPEG images published under Creative Commons license by their authors. Those images can be useful to test algorithms and pipelines on 2D d......
Read more at Scikit-learn User Guide2.3. Clustering
Clustering of unlabeled data can be performed with the module sklearn.cluster. Each clustering algorithm comes in two variants: a class, that implements the fit method to learn the clusters on trai......
Read more at Scikit-learn User Guide2.1. Gaussian mixture models
sklearn.mixture is a package which enables one to learn Gaussian Mixture Models (diagonal, spherical, tied and full covariance matrices supported), sample them, and estimate them from data. Facilit......
Read more at Scikit-learn User Guide1.3. Kernel ridge regression
Kernel ridge regression (KRR)[M2012] combines Ridge regression and classification(linear least squares with l2-norm regularization) with the kernel trick. It thus learns a linear function in the sp......
Read more at Scikit-learn User Guide1.6. Nearest Neighbors
sklearn.neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably m......
Read more at Scikit-learn User Guide9. Model persistence
After training a scikit-learn model, it is desirable to have a way to persist the model for future use without having to retrain. The following sections give you some hints on how to persist a scik......
Read more at Scikit-learn User Guide1.12. Multiclass and multioutput algorithms
This section of the user guide covers functionality related to multi-learning problems, including multiclass, multilabel, and multioutput classification and regression. The modules in this section ......
Read more at Scikit-learn User Guide1.1. Linear Models
The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the features. In mathematical notation, if\hat{y} is the predicted val......
Read more at Scikit-learn User Guide1.5. Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions such as (linear) Support Vector Machines and Logis......
Read more at Scikit-learn User Guide7.2. Real world datasets
scikit-learn provides tools to load larger datasets, downloading them if necessary. They can be loaded using the following functions: The Olivetti faces dataset: This dataset contains a set of face......
Read more at Scikit-learn User Guide1.14. Semi-supervised learning
Semi-supervised learning is a situation in which in your training data some of the samples are not labeled. The semi-supervised estimators in sklearn.semi_supervised are able to make use of this ad......
Read more at Scikit-learn User Guide1.2. Linear and Quadratic Discriminant Analysis
Linear Discriminant Analysis ( LinearDiscriminantAnalysis) and Quadratic Discriminant Analysis ( QuadraticDiscriminantAnalysis) are two classic classifiers, with, as their names suggest, a linear a......
Read more at Scikit-learn User Guide6.5. Unsupervised dimensionality reduction
If your number of features is high, it may be useful to reduce it with an unsupervised step prior to supervised steps. Many of the Unsupervised learning methods implement a transform method that ca......
Read more at Scikit-learn User Guide3.1. Cross-validation: evaluating estimator performance
Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would ha......
Read more at Scikit-learn User Guide- «
- ‹