BiRefNet variants explained: which one to use for portraits, products, and complex scenes

BiRefNet is not one model -- it is a family. General, Portrait, HR, Matting, COD, and more. This guide breaks down what each variant does well and where it falls short so you can pick the right one for your workflow.

Z.Tools blog OG image: birefnet-background-removal-series

BiRefNet is easy to misunderstand because the name gets used two ways. Sometimes people mean the research architecture from the CAAI AIR paper "Bilateral Reference for High-Resolution Dichotomous Image Segmentation." Other times they mean one of the public checkpoints built from that architecture: BiRefNet General, BiRefNet Portrait, BiRefNet Matting, BiRefNet HR, BiRefNet COD, and a few related variants.

That distinction matters more than it sounds. Background removal is not one problem. A catalog shoe on a white sweep, a person with flyaway hair, a glass perfume bottle, and a bird hidden in branches all ask for different behavior from the mask. One variant can look clean on products and blunt on hair. Another can preserve soft edges beautifully and then make a hard plastic object look slightly hazy.

The practical answer is not "use the newest one." It is to pick the checkpoint that matches the image.

Why BiRefNet became the default open option

The original BiRefNet paper frames the task as dichotomous image segmentation, which is a formal way of saying the model separates the important foreground from everything else. The research contribution is the bilateral reference mechanism. In plain English, BiRefNet tries to keep the whole image in mind while still paying attention to local edge detail.

That is exactly where older background removal models often stumble. A model that sees only the whole frame may understand that a person is the subject, but it can shave off hair, fingers, strings, cables, or lace. A model that stares too closely at patches may preserve edge texture while losing the larger question of what belongs to the subject.

BiRefNet combines global localization with detail reconstruction. The paper reports broad benchmark gains across high resolution dichotomous segmentation, camouflaged object detection, high resolution salient object detection, and ordinary salient object detection. The public Hugging Face model card also notes that the main weights are MIT licensed, around 0.2 billion parameters, and widely used through Spaces and downstream tools.

There is still a catch: benchmark wins do not mean every BiRefNet variant is best for every image. The model family exists because the training data changes the behavior.

BiRefNet General

Start with BiRefNet General when you do not have a strong reason to choose something else. It is the most sensible first pass for mixed content: products, ordinary objects, people in simple scenes, food, furniture, and the usual "remove the background from this image" jobs.

The General variant has the best day to day balance because it is not overly specialized. It tends to keep object boundaries crisp, does not assume the subject is always a person, and is less likely than a matting-focused model to soften an edge that should stay sharp.

I would use it first for e-commerce product photos, marketplace thumbnails, social posts, object cutouts, and images where the subject is visually clear. If the result is already clean, there is no prize for switching models. Save the special variants for the images that give General trouble.

BiRefNet Portrait

BiRefNet Portrait is for people. More specifically, it is for people in normal photographic compositions: headshots, profile photos, fashion images, creator thumbnails, staff pages, and full body shots where the person is the obvious subject.

The Portrait checkpoint was trained with portrait-oriented data including P3M-10k and TR-humans. On the P3M portrait test set, the public card reports an S-measure of 0.983 and mean absolute error of 0.006. Those numbers are not a guarantee for your studio shoot, but they do line up with the intended use: clean human extraction.

Portrait is usually worth trying when General clips shoulders, hair, hands, or clothing edges. It is less convincing when the person is part of a crowded scene, partly hidden, heavily motion-blurred, or dressed in colors that blend into the background. For those images, the problem may be ambiguity rather than "portraitness."

BiRefNet Matting

BiRefNet Matting is the variant I reach for when the edge itself is the job.

A normal segmentation mask often behaves like a yes-or-no cutout. Matting is more careful with partial transparency. That matters for hair, fur, smoke, translucent fabric, veils, mesh, glass, motion blur, and any boundary where the foreground fades into the background instead of ending cleanly.

The public BiRefNet Matting card lists training data from several matting sets, including P3M-10k, AM-2k, AIM-500, Human-2k, Distinctions-646, HIM2K, and PPM-100. Its reported P3M non-portrait test result includes an S-measure of 0.979 and mean squared error of 0.003. Again, read that as evidence of training focus, not a promise that every strand will survive.

The downside is softness. If you use Matting on a hard-edged product, it can make the boundary feel a little less decisive. For a metal chair, a phone case, or a cardboard box, BiRefNet General often looks better because the edge should be firm.

BiRefNet HR

BiRefNet HR is trained for higher resolution input, with public documentation describing 2048 by 2048 training. Use it when your source image has enough real detail to justify the extra work: print assets, large product photos, detailed illustrations, high resolution packshots, and close crops where a weak edge will be obvious.

The HR model card reports a clear benchmark gap on DIS validation data when compared at high resolution. BiRefNet HR shows an S-measure of 0.927 and mean absolute error of 0.026 at 2048 by 2048, while the standard general checkpoint is listed at 0.898 and 0.037 in the same high resolution comparison.

That does not mean HR is always better. If your upload is small, compressed, or destined for a tiny thumbnail, HR may only spend more compute on detail that is not there. Use it when the output will be inspected, composited, printed, or zoomed.

BiRefNet HRSOD

BiRefNet HRSOD sits close to HR, but the intent is slightly different. HRSOD means high resolution salient object detection. Salient object detection is about finding the visually dominant subject in an image.

That makes HRSOD useful when the subject is obvious to a human viewer but the image contains a lot of high resolution texture: a bag on a patterned rug, a ceramic piece on a busy table, a plant against detailed wallpaper. It is less of a portrait tool and more of a "please find the main thing in this detailed frame" tool.

If you are choosing between HR and HRSOD, I would try HR for high resolution cutout fidelity and HRSOD when the subject is prominent but the scene is visually busy.

BiRefNet COD

BiRefNet COD is the odd one, and it is more useful than its name suggests. COD means camouflaged object detection. The official model zoo lists COD training on COD10K and CAMO data, which are built around objects that blend into their surroundings.

That sounds like wildlife photography, and yes, it helps there. It also maps to common commercial problems: a beige product on a beige surface, a patterned dress against patterned wallpaper, white packaging on a white set, or a pet on a textured blanket.

Do not start with COD for clean product photos. It can overthink images that are already easy. Use it when the normal variants lose the subject because the background has the same color, pattern, or contrast.

BiRefNet DIS and the large training variants

You may also see BiRefNet DIS or large DIS5K-flavored variants in provider lists. These are closer to the research side of the family. DIS is the benchmark task behind the original paper, and DIS5K is one of the important datasets in that world.

For normal background removal, I would not make DIS your first choice unless you are testing segmentation behavior or comparing research checkpoints. In a production editing workflow, General, Portrait, Matting, HR, HRSOD, and COD are easier to reason about because their names describe the image problem you are trying to solve.

How BiRefNet compares with Bria RMBG

Bria RMBG v2.0 complicates the story because it is also built on BiRefNet architecture. Bria's public repository says RMBG v2.0 uses BiRefNet with Bria's own dataset and training scheme. It also says the model was trained on more than 15,000 manually labeled, fully licensed images covering objects, people, animals, text, photorealistic content, and non-photorealistic content.

So the comparison is not "BiRefNet versus a totally unrelated model." It is closer to open BiRefNet variants versus a commercially packaged BiRefNet-derived model with a strong focus on licensed training data.

If your main concern is cutout quality across everyday images, BiRefNet General and Bria RMBG are both worth testing. If your concern is training data provenance or commercial licensing, Bria's positioning is clearer. If your concern is choosing the right specialist for a hard image, the BiRefNet family gives you more knobs: Portrait for people, Matting for soft edges, HR for large images, COD for low-contrast subjects.

Where you can run it

The official weights are on Hugging Face, and the GitHub repository points users to the model zoo, demos, and deployment options. The project page also mentions hosted inference through fal, while Replicate has a public BiRefNet endpoint with millions of runs. Runware exposes a background removal tool with several BiRefNet variants, including General, Portrait, Matting, HRSOD, COD, and speed-oriented options.

That spread is part of why BiRefNet shows up everywhere. You can run it locally if you are comfortable with Python and GPU setup, or use a hosted provider if you just need output. For most creators and small teams, hosted access is the saner choice. The time you save on environment setup is usually worth more than the tiny per-image cost.

Generador de Imagenes IA

Generador de Imagenes IA

Crea imagenes a partir de texto con modelos de IA

Keep reading