# Combining images: means, medians, and standard deviations

Apr 15 08 08:57 Filed in: Ask Craig

**Q: I hear medians are a good way to stack images as they can remove things like hot pixels, cosmic rays, or streaks from satellites. Does Nebulosity support this?**

The short answer is no ... but... When combining images we want something that helps reduce the noise. We'd also like something that is tolerant of "outliers". The mean (average) is great at the first part but lousy at the second part. Medians are not so hot at the first part but great on the second part. What we'd like is something that is good at both parts. Nebulosity supports standard-deviation based filtering of your images to let you keep most of the frames and pitch just the bad ones.

**OK, so what is it and why is it better? What are these 1.5, 1.75, etc. thresholds I'm being asked about?**

If you were to take a perfect image of a target, each pixel would have its "ideal" or "true" value - how much intensity there is from that part of the target. The trouble is, each time we sample the target (aka each image) we get that true value for a pixel but we also get some noise on top of it. We want, of course, the true value. How do we get rid of that noise?

In statistics, we have several typical ways of describing our data. Think right now just about a single pixel (after alignment). So, we have the same spot in the arm of M51 or something. The most common way is the mean (aka average) of all of our samples (aka images, light frames, etc.). It should tell us the central tendency and therefore estimate the truth. The more samples we have, the better the estimate is since we don't have to rely on just one sample (which has truth plus or minus some noise value) or a few samples. With more samples, the noise tends to cancel and we are left with a closer estimate of the truth (the noise, BTW, tends to follow a 1/sqrt(# samples) rule). We can quantify how much noise there is in our samples with a second way of describing our data. The variance (and its square root, the standard deviation) are the typical ways we do this, telling us how much "spread" there is in our samples.

If we assume the data are "normal", about 70% of all samples will lie within one standard deviation (SD) of the mean (that is, 70% are less than one standard deviation above or one standard deviation below the average). About 95% like within 2 SD of the mean. Below, I show the histogram of 5000 normally-distributed random numbers (pretend you had 5000 light frames!). Samples in green lie within 1 SD of the mean. Yellow (and green) lie within 1.5 SD. Orange (and yellow and green) are within 2 SD and red are outside of 2SD. Now, these are all real samples (nothing like an errant satellite) but we could safely pitch those samples in red or orange and still have a good estimate of the mean. We'd not loose too many samples and we'd take out those that are more likely to be outliers. If a sample comes in that is > 2SD, odds are pretty good it's an errant value (like a hot pixel or satellite). Even if it's not, since we don't have 5000 samples - we have far fewer - filtering these out will help keep our estimate of the mean centered where it should be and not skewed by that outlier. Thinking about this diagram will help us a lot in the next step - understanding what happens during stacking. Just remember that with the standard deviation, we know what kind of values we might expect to find and what type of values are really abnormal (e.g., something 5 SD from the mean is very abnormal as there is only a 0.000057% chance this is a real sample and not the result of something else going on).

OK, given that background, here is what happens during the stack. For each (aligned) pixel, we calculate the mean and standard deviation across all of the images in the stack. If your SD threshold is at 1.5, any samples of that pixel that have an intensity beyond 1.5 SD from the mean are removed and a new average, excluding these samples, is calculated. This, BTW, is why hot pixels are often eliminated using SD stacking - those hot pixel values are very abnormal and lie far away from the mean.

With the filter set at 1.75, it takes a more extreme or "outlying" intensity value to be counted as "bad" than at 1.5. At 2.0, it takes even more abnormal a value to be excluded. Thus, more samples go into the final image using a higher threshold (and more things like semi-hot pixels as well). Typically, filtering values at 1.5 or 1.75 will yield the best results.

Standard-deviation based stacking therefore lets in more good samples than a median and takes out more (>0) bad samples than the mean (average). That's what makes it such a nice technique for filtering out bad samples. Note, you're not filtering out whole frames. This math is done on each pixel. So, frame #1 may have a bad value at pixel 109,231 but be great everywhere else. For all other pixels, this frame's data will be used but for pixel 109,231 it won't.

The technique isn't perfect. With a lot of outliers, the estimate of the standard deviation goes way up. So, we have a bigger "spread" that is considered normal and it takes something more aberrant to get filtered out. There are techniques to get around this, of course as well, but that's a topic for another day.