Member-only story

Failure in Handling Missing data

Zaid Alissa Almaliki
3 min readJul 11, 2021

--

As a data scientist, I often receive data in which the individuals fail to describe, analyse, or even acknowledge missing data. This is frustrating, as it is often of the utmost importance. Conclusions may change when missing data is accounted for. A few seem to not even appreciate that in conventional regression, only rows with complete data are included.

What is missing data?

We say that there is missing data. if values may be coded as NAN, . , an empty cell (“”), or a common numeric code (often -99 or 99).

While the best solution for missing data is to avoid it in the first place by developing good data-collection and stewardship policies, often we have to make due with what’s available.

Types of Missing Data

It is usual to define three kinds of missing data:

  1. Missing completely at random (MCAR).
  2. Missing at random (MAR).
  3. Missing not at random (MNAR).

Missing Completely at Random (MCAR):

The reason for missingness is totally independent of the predictors and response. Let’s think about examples. Look at the following dataset. Let’s say we gave a questionnaire out to fill in by hand, and these are the…

--

--

Zaid Alissa Almaliki
Zaid Alissa Almaliki

Written by Zaid Alissa Almaliki

Data Engineer, LinkedIn and Twitter Top Voice. Contributing to leading platforms like Towards Data Science.

No responses yet