Why features, anyway?
Raw pixels are fragile — rotate an image, add noise, change the lighting, and the pixel vector changes completely. A good feature is stable under these transformations while still distinguishing one thing from another. Watch the pixels explode while the histogram stays calm.
SIFT — from scale to descriptor
SIFT builds an invariant feature in four stages. Step through them: detect the right scale, localize and prune keypoints, assign a canonical orientation, then build the 128-d descriptor.
Local Binary Patterns
Compare the center pixel to each of its eight neighbors. Each comparison is a single bit. Read the bits clockwise as a binary number. That's it — you've computed an LBP code. Click any cell to cycle its intensity and watch the code change in real time.
Gabor filters — frequency meets space
A Gabor filter is a sine wave inside a Gaussian envelope. It asks a very specific question: is there a stripe pattern here, at this orientation and this spacing? Rotate θ and watch the response map light up over matching textures. Change λ to probe different feature sizes.
Gray-level co-occurrence
The GLCM answers a single question: how often does intensity i sit next to intensity j? Smooth textures concentrate values on the diagonal (i ≈ j). Rough textures spread them everywhere. Four Haralick statistics capture this in a handful of numbers. Click the image to edit pixels.
Shape descriptors
After segmentation, the geometry of the region carries information. Compactness tells you how circle-like something is, eccentricity how elongated. Deform the shape and watch the numbers respond — a perfect circle scores 1 on compactness; anything else scores less.
The radiomics pipeline
Radiomics treats medical images as high-dimensional data. A single CT or MRI slice yields hundreds of quantitative features — first-order statistics, shape, texture — that feed machine learning models for diagnosis, prognosis, and treatment response. Click through each stage below.
A medical scanner produces a 3D volume — stacked 2D slices with calibrated intensity values (Hounsfield units for CT, relaxation times for MRI, activity for PET). Quality depends on scanner model, acquisition protocol, reconstruction kernel.
A radiologist (or an algorithm) delineates the region of interest — a tumor, an organ, a lesion. All subsequent features are computed only over this mask. Segmentation quality directly limits feature reproducibility.
Hundreds of quantitative features are computed from the ROI — intensity statistics, shape geometry, and texture matrices (GLCM, GLRLM, GLSZM…). Features can also be computed from filtered versions of the image (LoG, wavelets).
| FIRST-ORDER (8) | ||
| Mean intensity | 1247.3 | HU |
| Std deviation | 183.7 | HU |
| Skewness | -0.42 | |
| Entropy | 4.18 | bits |
| SHAPE (14) | ||
| Volume | 3.2 | cm³ |
| Sphericity | 0.73 | |
| Elongation | 0.61 | |
| TEXTURE — GLCM (24) | ||
| Contrast | 38.91 | |
| Homogeneity | 0.44 | |
| Correlation | 0.72 | |
| TEXTURE — GLRLM (16) | ||
| Short-run emphasis | 0.81 | |
| Run-length non-uniformity | 124.3 | |
| FILTERED — LoG σ=2 (+100) | ||
| LoG-GLCM contrast | 12.7 | |
| … | 1247 more | |
With 1500+ features and often fewer than 200 patients, overfitting is inevitable. Selection methods (LASSO, correlation filtering, mutual information) prune the feature set to a handful that are reproducible, non-redundant, and predictive of the clinical outcome.
Selected features feed a classifier or regression model — logistic regression, random forests, gradient boosting — trained to predict diagnosis, survival, or treatment response. Performance must be validated on an external cohort before any clinical claim.