AI-powered search & chat for Data / Computer Science Students

8.1. Strategies to scale computationally: bigger data

 Scikit-learn User Guide

For some applications the amount of examples, features (or both) and/or the speed at which they need to be processed are challenging for traditional approaches. In these cases scikit-learn has a nu......

Read more at Scikit-learn User Guide

6.2. Feature extraction

 Scikit-learn User Guide

The sklearn.feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image. Loading featur......

Read more at Scikit-learn User Guide

1.13. Feature selection

 Scikit-learn User Guide

The classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators’ accuracy scores or to boost their perfor......

Read more at Scikit-learn User Guide

5. Visualizations

 Scikit-learn User Guide

Scikit-learn defines a simple API for creating visualizations for machine learning. The key feature of this API is to allow for quick plotting and visual adjustments without recalculation. We provi......

Read more at Scikit-learn User Guide

1.4. Support Vector Machines

 Scikit-learn User Guide

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection. The advantages of support vector machines are: Effective in high ......

Read more at Scikit-learn User Guide

1.10. Decision Trees

 Scikit-learn User Guide

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning s......

Read more at Scikit-learn User Guide

7.3. Generated datasets

 Scikit-learn User Guide

In addition, scikit-learn includes various random sample generators that can be used to build artificial datasets of controlled size and complexity. Generators for classification and clustering: Th......

Read more at Scikit-learn User Guide

10. Common pitfalls and recommended practices

 Scikit-learn User Guide

The purpose of this chapter is to illustrate some common pitfalls and anti-patterns that occur when using scikit-learn. It provides examples of what not to do, along with a corresponding correct ex......

Read more at Scikit-learn User Guide

7.1. Toy datasets

 Scikit-learn User Guide

scikit-learn comes with a few small standard datasets that do not require to download any file from some external website. They can be loaded using the following functions: These datasets are usefu......

Read more at Scikit-learn User Guide

6.3. Preprocessing data

 Scikit-learn User Guide

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream esti......

Read more at Scikit-learn User Guide

7.4. Loading other datasets

 Scikit-learn User Guide

Sample images: Scikit-learn also embeds a couple of sample JPEG images published under Creative Commons license by their authors. Those images can be useful to test algorithms and pipelines on 2D d......

Read more at Scikit-learn User Guide

2.3. Clustering

 Scikit-learn User Guide

Clustering of unlabeled data can be performed with the module sklearn.cluster. Each clustering algorithm comes in two variants: a class, that implements the fit method to learn the clusters on trai......

Read more at Scikit-learn User Guide

2.1. Gaussian mixture models

 Scikit-learn User Guide

sklearn.mixture is a package which enables one to learn Gaussian Mixture Models (diagonal, spherical, tied and full covariance matrices supported), sample them, and estimate them from data. Facilit......

Read more at Scikit-learn User Guide

1.3. Kernel ridge regression

 Scikit-learn User Guide

Kernel ridge regression (KRR)[M2012] combines Ridge regression and classification(linear least squares with l2-norm regularization) with the kernel trick. It thus learns a linear function in the sp......

Read more at Scikit-learn User Guide

1.6. Nearest Neighbors

 Scikit-learn User Guide

sklearn.neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably m......

Read more at Scikit-learn User Guide

9. Model persistence

 Scikit-learn User Guide

After training a scikit-learn model, it is desirable to have a way to persist the model for future use without having to retrain. The following sections give you some hints on how to persist a scik......

Read more at Scikit-learn User Guide

1.12. Multiclass and multioutput algorithms

 Scikit-learn User Guide

This section of the user guide covers functionality related to multi-learning problems, including multiclass, multilabel, and multioutput classification and regression. The modules in this section ......

Read more at Scikit-learn User Guide

1.1. Linear Models

 Scikit-learn User Guide

The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the features. In mathematical notation, if\hat{y} is the predicted val......

Read more at Scikit-learn User Guide

1.5. Stochastic Gradient Descent

 Scikit-learn User Guide

Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions such as (linear) Support Vector Machines and Logis......

Read more at Scikit-learn User Guide

7.2. Real world datasets

 Scikit-learn User Guide

scikit-learn provides tools to load larger datasets, downloading them if necessary. They can be loaded using the following functions: The Olivetti faces dataset: This dataset contains a set of face......

Read more at Scikit-learn User Guide

1.14. Semi-supervised learning

 Scikit-learn User Guide

Semi-supervised learning is a situation in which in your training data some of the samples are not labeled. The semi-supervised estimators in sklearn.semi_supervised are able to make use of this ad......

Read more at Scikit-learn User Guide

1.2. Linear and Quadratic Discriminant Analysis

 Scikit-learn User Guide

Linear Discriminant Analysis ( LinearDiscriminantAnalysis) and Quadratic Discriminant Analysis ( QuadraticDiscriminantAnalysis) are two classic classifiers, with, as their names suggest, a linear a......

Read more at Scikit-learn User Guide

6.5. Unsupervised dimensionality reduction

 Scikit-learn User Guide

If your number of features is high, it may be useful to reduce it with an unsupervised step prior to supervised steps. Many of the Unsupervised learning methods implement a transform method that ca......

Read more at Scikit-learn User Guide

3.1. Cross-validation: evaluating estimator performance

 Scikit-learn User Guide

Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would ha......

Read more at Scikit-learn User Guide