This post presents a quick tutorial on how to fill missing values in variables in Stata. This tutorial uses fillmissing program which can be downloaded by typing the following command in Stata command window

ssc install fillmissing, replace

Important Note: This post does not imply that filling missing values is justified by theory. Users should make their own decisions and follow appropriate theory while filling missing values.

After the installation of the fillmissing program, we can use it to fill missing values in numeric as well as string variables. Also, this program allows the bysort prefix to fill missing values by groups. We shall see several examples of using bysort prefix to perform by-groups calculations. But let us first quickly go through the different options of the program.

Program Options

The fillmissing program offers the following options to fill missing values

  1. with(any)
  2. with(previous)
  3. with(next)
  4. with(first)
  5. with(last)
  6. with(mean)
  7. with(max)
  8. with(min)
  9. with(median)

Let us quickly go through these options. Please note that options starting from serial number 6 are applicable only in the case of numerical variables.

1. with(any)

Option with() is used to specify the source from where the missing values will be filled. Option with(any) is an optional option and hence if not specified, will automatically be invoked by the fillmissing program. This option is best to fill missing values of a constant variable, i.e. a variable that has all similar values, however, due to some reason, some of the values are missing. Option with(any) will try to fill the missing values from any available non-missing values of the given variable.

Example 1: Fill missing values with(any)

Let us first create a sample dataset of one variable having 10 observations. You can copy-paste the following code to Stata Do editor to generate the dataset

clear all set obs 10 gen symbol = "AABS" replace symbol = "" in 5 replace symbol = "" in 8                  

The above dataset has missing values on row 5 and 8. To fill the missing values from any other available non-missing values, let us use the with(any) option.

fillmissing symbol, with(any)

Since with(any) is the default option of the program, we could also write the above code as

fillmissing symbol

2. with(previous)

Option with(previous) is used to fill the current missing value with the preceding or previous value of the same variable. Please note that if the previous value is also missing, the current value will remain missing. Further, this option does not sort the data, so whatever the current sort of the data is, fillmissing will use that sort and identify the current and previous observation.

Example 2: Fill missing values with(previous)

Let's create a dummy dataset first.

clear all set obs 10 gen symbol = "AABS"  replace symbol = "AKBL" in 1 replace symbol = "" in 2                  

The dataset looks like this

                    +--------+  | symbol |  +--------+  |   AKBL |  |        |  |   AABS |  |   AABS |  |   AABS |  |   AABS |  |   AABS |  |   AABS |  |   AABS |  |   AABS |  +--------+                  

To fill the missing value in observation number 2 with AKBL, i.e. from previous observation, we would type:

fillmissing symbol, with(previous)

What's Next

In the next blog post, I shall talk about other options of the fillmissing program. Specifically, I shall discuss the use of by and bys with fillmissing program. Therefore, you may visit the blog section of this site or subscribe to updates from this site.

Your support helps these efforts alive