A lightweight C# library demo for extracting thumbnails from HEIF images (e.g. iPhone .heic files) using the native libheif library.
The library decodes the embedded thumbnail from the image and converts it to a standard RGB format for easy viewing or further processing.
- .NET 8 or later
- libheif and its dependencies 1.20+
Place the native libheif binaries next to your application. The library searches for a libheif library, so the binary built by vcpkg needs to be renamed.
E.g. for Windows, you need:
- libde265.dll
- libheif.dll
- libx265.dll
You must query width/height using the channel that actually exists in the decoded image.
libheif only returns width/height for channels that actually have a plane.
If the plane does not exist ? width = -1, height = -1.
So the correct logic is:
Ask for Y plane
If Y plane exists ? use it
If not ? try interleaved
If not ? try R/G/B
If not ? the decode failed
This is exactly how libheif’s own C++ helpers behave internally.
This matches libheif’s internal logic and guarantees you always get the correct dimensions.
Because:
YCbCr images ? Y plane always exists
RGB images ? interleaved plane exists
Some images ? R/G/B planes exist separately
libheif returns -1 only when the plane does not exist
By checking planes in the correct order, you always get a valid size.
Let me break it down clearly.
Even though the decoded colorspace is: YCbCr
and the chroma is: 4:4:4
libheif still exposes the decoded planes using the R/G/B channel enums when the decoder plugin internally converts the YCbCr planes into an RGB-like layout.
This happens in two cases:
Some HEVC/AVIF decoders (especially on Windows) output:
- R plane
- G plane
- B plane
instead of:
- Y plane
- Cb plane
- Cr plane
even though the colorspace is still reported as YCbCr.
This is a quirk of the plugin API: colorspace describes the encoded format, not necessarily the decoded plane layout.
So you get:
colorspace = YCbCr
chroma = 444
planes = R, G, B
This is valid and expected.
iPhone HEIC thumbnails (and many Android ones) are:
HEVC-coded
Full-resolution YCbCr 4:4:4
But decoded by libde265 or dav1d into RGB planar format
So the decoder gives you:
- R plane
- G plane
- B plane
instead of Y/Cb/Cr.
You're right to wonder if the thumbnail is already in a standard format. However, inside a HEIF (.heic) file, the thumbnail is not typically a separate JPEG or PNG file. Instead, it's a smaller, lower-resolution version of the main image that is compressed using the same codec: HEVC (High-Efficiency Video Coding).
The HEIF file is a container, and both the main image and its thumbnail are stored as HEVC-encoded data streams within that container.
This is the most critical point. The way we see images on a screen and the way they are efficiently compressed are often different.
-
RGB (Red, Green, Blue): This is what our screens use. Each pixel is represented by a combination of red, green, and blue light. This is an "additive" color model, and it's how we typically think of digital color. Formats like PNG and JPEG store their final data in a way that can be easily translated to RGB.
-
YCbCr: This is what HEVC (and many other video/image codecs like JPEG) uses for compression. It separates the image into three components:
- Y: The luma component, which is essentially the black-and-white (brightness) information of the image.
- Cb & Cr: The chroma components, which represent the color information (blue-difference and red-difference).
The reason for this separation is a clever trick based on human biology. Our eyes are much more sensitive to changes in brightness (luma) than to changes in color (chroma).
Codecs like HEVC exploit this by keeping the Y (brightness) information at high resolution but reducing the resolution of the Cb and Cr (color) information. This is called chroma subsampling. It allows the codec to throw away a lot of color data that our eyes wouldn't have noticed anyway, leading to significantly smaller file sizes with very little perceptible loss in quality.
So, when we use libheif to decode the thumbnail, we are extracting the compressed YCbCr data. This data is not something a standard image viewer or library can directly understand or display. It's just a stream of numbers representing the Y, Cb, and Cr values for each pixel.
To make it a viewable image, we have to perform a mathematical conversion from the YCbCr color space back to the RGB color space. This is what the ConvertYCbCrToRgb method is doing. It takes the Y, Cb, and Cr values for each pixel and calculates the corresponding R, G, and B values.
- The thumbnail is a small HEVC-encoded image.
- HEVC uses the YCbCr color model for efficient compression.
- Our target format (PNG) and our screens use the RGB color model.
- Therefore, we must decode the HEVC data and convert the resulting YCbCr pixel data to RGB before we can save it as a PNG.
A stride is one of those concepts that suddenly makes everything about image buffers click into place once you see it clearly. Let me give you the version that’s actually useful when working with libheif and raw pixel planes.
Stride = the number of bytes per row in memory Even if an image is, say, 200 pixels wide, the number of bytes used for each row in memory is not necessarily 200 × bytes?per?pixel.
That actual number is called the stride (sometimes “row pitch”).
Why stride exists Two main reasons:
- Alignment / padding Many image formats pad each row to 4?byte or 8?byte boundaries for performance.
Image width: 201 pixels
Format: RGB24 (3 bytes per pixel)
Expected row size: 201 × 3 = 603 bytes
But 603 is not aligned
So the library might round up to 608 bytes
? Stride = 608, not 603.
- Some formats store extra metadata or padding per row YCbCr planes, interleaved formats, and GPU?friendly layouts often require padding.
Imagine each row of pixels is a shelf:
Code
| pixel pixel pixel pixel ... padding padding |
The stride is the full width of the shelf, including padding.
The width × bytes?per?pixel is only the part containing actual pixels.
Why stride matters in your libheif wrapper When you copy the decoded thumbnail into a byte array:
csharp
int size = stride * height;
Marshal.Copy(plane, buffer, 0, size);
You are copying the entire padded rows, not just the visible pixels.
When you later convert to PNG or a Bitmap, you must respect the stride:
csharp
for (int y = 0; y < height; y++)
{
int srcOffset = y * stride;
// copy only width * 3 bytes of actual pixel data
}
If you ignore stride, your image will look:
skewed, shifted, corrupted, or have colored bars on the right
Concrete example Let’s say libheif gives you:
- width = 200
- height = 100
- stride = 640
But RGB24 is 3 bytes per pixel:
expected row size = 200 × 3 = 600 bytes
stride = 640 bytes ? 40 bytes of padding per row
So each row in memory looks like:
600 bytes of real pixels + 40 bytes of padding