Member-only story
Failure in Handling Missing data
As a data scientist, I often receive data in which the individuals fail to describe, analyse, or even acknowledge missing data. This is frustrating, as it is often of the utmost importance. Conclusions may change when missing data is accounted for. A few seem to not even appreciate that in conventional regression, only rows with complete data are included.
What is missing data?
We say that there is missing data. if values may be coded as NAN, . , an empty cell (“”), or a common numeric code (often -99
or 99
).
While the best solution for missing data is to avoid it in the first place by developing good data-collection and stewardship policies, often we have to make due with what’s available.
Types of Missing Data
It is usual to define three kinds of missing data:
- Missing completely at random (MCAR).
- Missing at random (MAR).
- Missing not at random (MNAR).
Missing Completely at Random (MCAR):
The reason for missingness is totally independent of the predictors and response. Let’s think about examples. Look at the following dataset. Let’s say we gave a questionnaire out to fill in by hand, and these are the…