AnarchismIsCool 3 months ago

In JPEG*

The title should reflect that. Obviously this is misleading otherwise, there are a million ways of representing grayscale in data.

Sure, you can argue just picking a color channel and making it the grayscale source is not gonna work as expected, but what if we're talking about the data off a pan sensor vs an RGB array? What about a pan sensor created by taking an RGB sensor and removing the filter so it's producing an image with 3x the resolution but in pan?

This is a complicated question.

  • kadoban 3 months ago

    > In JPEG*

    This should end up being true for any presentation image format (the one you're going to actually show people, not necessarily the one you might do editing work on) optimized for human vision as well.

    It's more a property of our eyes than a quirk of JPEG. So I don't find it misleading personally.

    • AnarchismIsCool 3 months ago

      I'd argue compressed file formats make up a lot less of the broader problem space than most people realize. If you're doing anything high performance (where the difference matters) but not immediately consumer facing you're spending a lot of time working with lossless/uncompressed imagery in ram/disk. Stuff like MIPI->CUDA for classification, this question can go a number of different ways very quickly.

      Keep in mind, at least in my experience, if you're even remotely thinking about doing anything with grayscale and this question comes up you're probably doing something...interesting (read: multi band remote sensing shenanigans).

  • dTal 3 months ago

    > What about a pan sensor created by taking an RGB sensor and removing the filter so it's producing an image with 3x the resolution but in pan?

    This got me thinking about how subtle and slippery the concept of "resolution" is.

    The relationship is not strictly 3x; an ideally demosaiced Bayer filtered image still contains some information at the original capture resolution, because R, G, and B are correlated in nontrivial ways. Nyquist is a nice and simple result for the special case where nothing can be predicted about the original function, measurements are assumed to be perfect, and the criterion for reconstruction is "exact" - but in the general case of "recovering information, more is better" even a simple interpolation procedure with tame priors like "luminance exists" might perform so much better than the naive approach that we'd be silly not to use it.

    Even the Nyquist frequency itself is less fundamental than it first appears; a less well known result is that a function can also be perfectly recovered if sampled at half the Nyquist frequency, if the slope is captured as well as the value. In other words you can losslessly convert a 2MP image into a 1M value+slope data matrix and back. What resolution is it? Depends on your point of view.

    • krapht 3 months ago

      The same idea is used when doing complex IQ sampling of radio frequency data. I don't know if it's as obscure as you say, it's just not commonly considered when dealing with plain cameras. It's a bigger deal when working in the RF domain, or SAR imagery...

  • generalizations 3 months ago

    Yup. "Do boolean matrices take up less space than fp32 matrices?"

    • financltravsty 3 months ago

      Depends on compression ;)

      • generalizations 3 months ago

        Heh. Takes a scifi-level compression algorithm to make 32 bits comparable to 1 bit...or you'd need to get really lucky with your sparse matrix ;)

        • kbelder 3 months ago

          But the other way around, it's all too common that a 1 bit array is stored equivalently to a 32 bit array.

          • generalizations 3 months ago

            Yes, indeed. If a larger data type is used to represent the information, it will most certainly take up more space.

  • GuB-42 3 months ago

    This would be the case for any lossy format with decent compression. We are much less sensitive to color than we are to brightness so we can compress color much more than we can compress brightness for the same perceived quality.

    Not only that but color and brightness are often correlated, and advanced compression techniques use that property, see "chroma from luma" in AV1.

    Without compression, indeed, it can be 3x, or even 4x (for alignment).

dynm 3 months ago

This post jumps into using "4:4:4" / "4:2:2" / "4:2:0" notation without seeming to explain what it means. Would be very helpful to add a short explanation!

  • lifthrasiir 3 months ago

    And that notation is not really intuitive even when you do understand what it means. For example the initial 4 is technically a denominator of other two numbers, but it is not intended to be normalized! A consistent notation would use the reciprocal instead, say, "1-1" for 4:4:4, "2-2" for 4:2:2 and "1-∞" or "1-x" for 4:4:0.

vander_elst 3 months ago

I don't understand the point of the article, it seems to me that the point is that non compressed grayscale images don't take a third of the space of a compressed image? If we look at the last example the fist image 4:4:4 is 150KB the one in gray is 50KB (it seems exactly 1/3), the compressed images in-between take less space, but also the quality is much worse, I would say as expected. What am I missing?

morsch 3 months ago

I appreciate the article, but isn't the example image of a grey bird on cobblestone looking on a dark pool of water an odd choice to illustrate color subsampling?

immibis 3 months ago

The Y channel of the 4:4:4 "sub"sampled image looks sharper than the other comparison images. Look at the ground next to the bird's feet. I perceive it as sharper in the 4:4:4 image even though it's just about a single colour and the luma was supposedly not subsampled.

  • wmil 3 months ago

    Yeah the "without visual impact" bit is a lie. People working in image / video compression are in denial about the noticeable quality drop with 4:2:0. 4:2:0 is extremely noticeable if your computer monitor gets into that mode.

    There are some major failure cases involving alternating single pixel width red / black lines. That caused issues with DVDs because the format also used interlaced frames, meaning things like red tail lights at night could end up looking awful.

    Luckily AFAIK they've given up on interlacing.

    I really think with modern tech they'd be better off trying to compress 4:4:4 instead of immediately throwing out a lot of colour information.

arh68 3 months ago

I remember white/black-dithering (at even twice the dpi) being a lot smaller than the greyscale jpeg, when I was scanning documents. Greyscale was a bit smaller than full color, but dithered was a big step up (er, down).

  • kccqzy 3 months ago

    When I scanned a lot of documents about ten years ago, I empirically found that JBIG2 is so much better than anything else including JPEG. That's specially optimized for scanning text documents.

  • crazygringo 3 months ago

    For text documents with line drawings? For sure, absolutely -- especially when it's JBIG compression like fax machines use (or even Group 4 before that).

    If it's a full scale photograph full of grayscales that's been dithered, then it's a harder comparison to make. Because obviously a ton of useful detail is lost in the dithering, in a way that isn't with lines and letterforms. So the comparison would really have to be with a terrible, blocky JPEG super-compressed file. In theory the JPEG should win.

  • gwbas1c 3 months ago

    What were you storing? If the image was predominately black and white, this would make sense.

  • omoikane 3 months ago

    I was honestly expecting this to be an article about grayscale versus dithered black and white, where the answer to "do grayscale images take less space" could go either way. Instead, it's an article about grayscale versus sRGB.

xxr 3 months ago

Somewhat related, a few years ago I noticed that an image with gaussian blur applied to it takes up significantly less space in JPEG than does the original image. I suppose it works out because the blur diffuses information throughout a particular area such that each JPEG block has less detail to encode internally?

  • alephxyz 3 months ago

    A gaussian blur is essentially a low-pass filter so all the high frequency components are gone. The resulting image can be represented and stored using fewer DCT coefficients.

    This page has a few visual examples at the bottom: https://www.cs.unm.edu/~brayer/vision/fourier.html

  • pornel 3 months ago

    JPEG's compression is very similar to blurring.

    It converts the image into frequency domain, and reduces precision of the frequency data, with a special case when the data rounds down to zero.

    However, whole-image blur crosses the 8x8 block boundaries, so it isn't perfectly aligned with the blur that JPEG uses. Lowering quality setting in JPEG (or using custom quantisation tables) will be more effective.

    There's also a fundamental information-theoretic foundation for this — lower frequencies carry less information.

  • tsukurimashou 3 months ago

    blurring does average which compression does too

bawolff 3 months ago

Is it just the subsampling?

Naively i would assume that even if you have the unnessary the C_b C_r channels in a greyscale image, they are going to compress really well since they have very little information in a greyscale image.

  • plorkyeran 3 months ago

    Storing photos and sufficiently photo-like images as YCbCr rather than RBG does indeed tend to make them more compressible even without subsampling.

  • wmil 3 months ago

    JPEG / JFIF isn't really smart enough to take advantage of that. It's a file format from 1992, CPU speed was limited.

    • bawolff 3 months ago

      JPEG does entropy encoding.

jdefr89 3 months ago

I wonder if grayscale images are more efficient for compression to some degree. Too lazy to actually test or even explain why I am wondering...

Dwedit 3 months ago

Yes. No CB or CR channels.

fedeb95 3 months ago

this may be true for screen, but what about printing?

atoav 3 months ago

Does a mono sound take more space than a stereo sound file?

Yeah, half. Provided you don't store a mono sound in a stereo audiofile.

Same thing for grayscale images. If your white is a #FFFFFF instead of a #FF your picture is a color file that coincidentally displays a grayscale image.

simonblack 3 months ago

Roughly a third.

  • valine 3 months ago

    We don’t store color data in full precision usually. People aren’t sensitive to all colors equally, the less sensitive the eye is to a particular color the more efficiently you can store it.

    You can also typically discard high frequency data in your color channels. So not only are you not storing full precision data, you also don’t need to store roughly a third of the frequency bins. You can think of this like cropping the image but in the frequency domain. The data saving are identical to cropping the image but visually imperceptible.

    • shagie 3 months ago

      This shows up in CCD sensors where you have two green pixels, one red, and one blue in a 2x2 square.

      You can see the sensitivity as part of the ppmtopgm program.

      ppmtopgm (https://netpbm.sourceforge.net/doc/ppmtopgm.html)

          ppmtopgm reads a PPM as input and produces a PGM as output. The output is a "black and white" rendering of the original image, as in a black and white photograph. The quantization formula ppmtopgm uses is y = .299 r + .587 g + .114 b.
      
      
      Note the coefficient for blue is 0.114 and green is 0.587.

      This also has a spoof on Late Lament ( https://youtu.be/VNC54BKv3mc?t=109 ) that gives me a chuckle...

          Cold-hearted orb that rules the night
          Removes the colors from our sight
          Red is gray, and yellow white
          But we decide which is right
          And which is a quantization error.