Skip Navigation or Skip to Content

Feature Selection and Feature Engineering and Why It Matters

An Educational Talk

What this educational talk is

  • an introduction to why feature selection is an essential step in building DS and ML pipelines.
  • an introduction to basic feature engineering.
  • an indication to where to focus time and efforts on in the DS pipeline.

What this educational talk is not

  • an exhaustive guide to feature engineering.
  • an one-fits-all solution guide.

Objective of this educational talk

establish an intuition about where to focus efforts and time when creating DS and ML pipelines.
(assuming a dataset post preparation and cleaning is available)

Modules and Materials Collection

Note: the content of this educational talk is also being used in presentations.

Slides

slide deck used in the presentations and video materials.

Jupyter Notebooks

used in the presentations and video materials.
Note: these Notebooks have been developed using older versions of packages (particularly scikit-learn). They have notbeen verified with current versions.

Demo Notebook Selection

Demo Notebook Feature Engineering

Dataset

Madelon Dataset paper

Design of experiments for the NIPS 2003 variable selection benchmark

Isabelle Guyon – July 2003

Copy of the Dataset

Algorithms

Boruta Feature Selection Heuristic

Feature Selection with the Boruta Package

Miron B. Kursa (University of Warsaw)

Witold R. Rudnicki (University of Warsaw)

Videos
Overview
Feature Selection
Feature Selection notebook
Feature Engineering
Feature Engineering notebook
Remarks and Results