Feature Extraction — Interactive Lecture Companion

01 / 07

Why features, anyway?

Raw pixels are fragile — rotate an image, add noise, change the lighting, and the pixel vector changes completely. A good feature is stable under these transformations while still distinguishing one thing from another. Watch the pixels explode while the histogram stays calm.

Perturbations

Noise level 0%

Rotation 0°

Brightness 0

Pixel L2 drift0.00

Histogram drift0.00

The pixel vector changes by orders of magnitude. The histogram barely moves. That's why we extract features.

Original

Histogram (feature)

Transformed

Histogram (feature)

Overlay (difference)

Histogram overlay

02 / 07

SIFT — from scale to descriptor

SIFT builds an invariant feature in four stages. Step through them: detect the right scale, localize and prune keypoints, assign a canonical orientation, then build the 128-d descriptor.

Filter scale

Sigma σ 2.0

Peak σ for each blob

Small blobσ ≈ 3

Medium blobσ ≈ 5

Large blobσ ≈ 9

Strongest response

—

Each blob has a preferred σ. SIFT searches extrema across all of scale-space to find the right one automatically.

Input image

LoG response (|DoG|)

Response strength vs. σ (per blob)

Small

Medium

Large

Current σ

Keypoint pruning

Contrast threshold 0.03

Edge ratio r 10.0

Counts

Raw extrema—

After contrast—

After edge test—

Drag edge ratio down: ridge points along edges vanish, leaving only corner-like keypoints. The Hessian-eigenvalue ratio test is what makes SIFT prefer corners.

Synthetic scene

|DoG| + survivors

Rejected (low contrast)

Rejected (edge-like)

Kept

Canonical orientation

Patch rotation 0°

Patch content

Dominant direction

—

Read it like this: the small cyan arrows are local image gradients. The rose plot tallies their directions (Gauss-weighted toward the center). The biggest amber wedge is the dominant direction — SIFT then rotates the patch to that angle so the descriptor is rotation-invariant.

Patch + gradient field + dominant arrow

Gradient orientation histogram (36 bins, 10° each)

Invariance check — rotate the slider and watch:

Input patch (rotates with slider)

After canonicalization (stays put)

128-d descriptor

Patch content

Compare patch B (rotated)

B rotated by 45°

Descriptor distance ‖A − B‖

—

A and B are the same patch — B is just rotated. Each is canonicalized first, so their 128-d descriptors should be nearly identical. Watch the bars track each other as you rotate.

Patch A — 4×4 cells, 8-bin rose plots

Patch B — same content, rotated

Descriptor A (128 bins)

Descriptor B (128 bins)

03 / 07

Local Binary Patterns

Compare the center pixel to each of its eight neighbors. Each comparison is a single bit. Read the bits clockwise as a binary number. That's it — you've computed an LBP code. Click any cell to cycle its intensity and watch the code change in real time.

Presets

Center threshold g_c 128

Interpretation

—

LBP is invariant to monotonic intensity changes — if all pixels get brighter proportionally, the comparisons don't flip, and the code stays identical.

Neighborhood

click cells to edit

→

Thresholded

≥ g_c → 1

→

Bit sequence

00000000

b₇ b₆ b₅ b₄ b₃ b₂ b₁ b₀

0LBP code (0–255)

04 / 07

Gabor filters — frequency meets space

A Gabor filter is a sine wave inside a Gaussian envelope. It asks a very specific question: is there a stripe pattern here, at this orientation and this spacing? Rotate θ and watch the response map light up over matching textures. Change λ to probe different feature sizes.

Filter parameters

Orientation θ 0°

Wavelength λ 8

Envelope σ 4

Aspect γ 0.5

A filter bank of maybe 4 orientations × 2 scales gives 8 responses per pixel — a compact texture descriptor that's still biologically plausible (V1 neurons respond exactly like this).

Gabor kernel

Test image (4 oriented patches)

Filter response

Strongest region

—

Mean response (feature)

0.00

05 / 07

Gray-level co-occurrence

The GLCM answers a single question: how often does intensity i sit next to intensity j? Smooth textures concentrate values on the diagonal (i ≈ j). Rough textures spread them everywhere. Four Haralick statistics capture this in a handful of numbers. Click the image to edit pixels.

Texture presets

Distance d 1

Angle θ

Radiomics computes GLCM features across many (d, θ) combinations, then averages them — this gives rotation-robust descriptors that physicians use to characterize tumor heterogeneity.

Input (4 gray levels)

click to cycle levels

Co-occurrence P(i,j)

i=0i=1i=2i=3

j=0j=1j=2j=3

Contrast

0.00

Homogeneity

0.00

Energy

0.00

Correlation

0.00

06 / 07

Shape descriptors

After segmentation, the geometry of the region carries information. Compactness tells you how circle-like something is, eccentricity how elongated. Deform the shape and watch the numbers respond — a perfect circle scores 1 on compactness; anything else scores less.

Shape type

Elongation 1.0

Bumpiness 0

Rotation 0°

Hu moments combine these ideas into 7 features that are invariant to translation, scale, and rotation — still used today for matching after decades in production.

Segmented region

Area (px)

0

Perimeter (px)

0

Compactness

0.00

Eccentricity

0.00

07 / 07

The radiomics pipeline

Radiomics treats medical images as high-dimensional data. A single CT or MRI slice yields hundreds of quantitative features — first-order statistics, shape, texture — that feed machine learning models for diagnosis, prognosis, and treatment response. Click through each stage below.

What happens

A medical scanner produces a 3D volume — stacked 2D slices with calibrated intensity values (Hounsfield units for CT, relaxation times for MRI, activity for PET). Quality depends on scanner model, acquisition protocol, reconstruction kernel.

Feature definitions are modality-independent, but the same lesion imaged on two different CT scanners can yield different feature values — standardization (IBSI) matters.

Simulated CT slice

What happens

A radiologist (or an algorithm) delineates the region of interest — a tumor, an organ, a lesion. All subsequent features are computed only over this mask. Segmentation quality directly limits feature reproducibility.

Inter-observer variability in segmentation is one of the largest sources of noise in radiomics studies.

ROI delineation

What happens

Hundreds of quantitative features are computed from the ROI — intensity statistics, shape geometry, and texture matrices (GLCM, GLRLM, GLSZM…). Features can also be computed from filtered versions of the image (LoG, wavelets).

PyRadiomics computes ~100 features per image per filter — quickly reaching 1500+ features for a single ROI.

Feature vector (excerpt)

FIRST-ORDER (8)
Mean intensity	1247.3	HU
Std deviation	183.7	HU
Skewness	-0.42
Entropy	4.18	bits
SHAPE (14)
Volume	3.2	cm³
Sphericity	0.73
Elongation	0.61
TEXTURE — GLCM (24)
Contrast	38.91
Homogeneity	0.44
Correlation	0.72
TEXTURE — GLRLM (16)
Short-run emphasis	0.81
Run-length non-uniformity	124.3
FILTERED — LoG σ=2 (+100)
LoG-GLCM contrast	12.7
…	1247 more

What happens

With 1500+ features and often fewer than 200 patients, overfitting is inevitable. Selection methods (LASSO, correlation filtering, mutual information) prune the feature set to a handful that are reproducible, non-redundant, and predictive of the clinical outcome.

The ratio of samples to features should comfortably exceed 10:1 — otherwise features start to "remember" patients instead of learning biology.

Before & after selection

Kept (8)

Dropped (1492)

What happens

Selected features feed a classifier or regression model — logistic regression, random forests, gradient boosting — trained to predict diagnosis, survival, or treatment response. Performance must be validated on an external cohort before any clinical claim.

A radiomics signature that works only on the training scanner is a scanner signature, not a biology signature.

Simulated risk score

Predicted 2-year survival68%

Model AUC (external)0.74

RecommendationStandard therapy