Continuous Variable Stata Code Missing Values
fillmissing: Fill Missing Values in Stata
This post presents a quick tutorial on how to fill missing values in variables in Stata. This tutorial uses fillmissing program which can be downloaded by typing the following command in Stata command window
ssc install fillmissing, replace
Important Note: This post does not imply that filling missing values is justified by theory. Users should make their own decisions and follow appropriate theory while filling missing values.
After the installation of the fillmissing program, we can use it to fill missing values in numeric as well as string variables. Also, this program allows the bysort
prefix to fill missing values by groups. We shall see several examples of using bysort
prefix to perform by-groups calculations. But let us first quickly go through the different options of the program.
Program Options
The fillmissing program offers the following options to fill missing values
- with(any)
- with(previous)
- with(next)
- with(first)
- with(last)
- with(mean)
- with(max)
- with(min)
- with(median)
Let us quickly go through these options. Please note that options starting from serial number 6 are applicable only in the case of numerical variables.
1. with(any)
Option with()
is used to specify the source from where the missing values will be filled. Option with(any)
is an optional option and hence if not specified, will automatically be invoked by the fillmissing program. This option is best to fill missing values of a constant variable, i.e. a variable that has all similar values, however, due to some reason, some of the values are missing. Option with(any)
will try to fill the missing values from any available non-missing values of the given variable.
Example 1: Fill missing values with(any)
Let us first create a sample dataset of one variable having 10 observations. You can copy-paste the following code to Stata Do editor to generate the dataset
clear all set obs 10 gen symbol = "AABS" replace symbol = "" in 5 replace symbol = "" in 8
The above dataset has missing values on row 5 and 8. To fill the missing values from any other available non-missing values, let us use the with(any)
option.
fillmissing symbol, with(any)
Since with(any)
is the default option of the program, we could also write the above code as
fillmissing symbol
2. with(previous)
Option with(previous)
is used to fill the current missing value with the preceding or previous value of the same variable. Please note that if the previous value is also missing, the current value will remain missing. Further, this option does not sort the data, so whatever the current sort of the data is, fillmissing will use that sort and identify the current and previous observation.
Example 2: Fill missing values with(previous)
Let's create a dummy dataset first.
clear all set obs 10 gen symbol = "AABS" replace symbol = "AKBL" in 1 replace symbol = "" in 2
The dataset looks like this
+--------+ | symbol | +--------+ | AKBL | | | | AABS | | AABS | | AABS | | AABS | | AABS | | AABS | | AABS | | AABS | +--------+
To fill the missing value in observation number 2 with AKBL, i.e. from previous observation, we would type:
fillmissing symbol, with(previous)
What's Next
In the next blog post, I shall talk about other options of the fillmissing program. Specifically, I shall discuss the use of by
and bys
with fillmissing program. Therefore, you may visit the blog section of this site or subscribe to updates from this site.
Your support helps these efforts alive
12 Comments
Source: https://fintechprofessor.com/2019/12/20/fillmissing-fill-missing-values-in-stata/
dear dr please check your email I have asked about Corporate governance data.. detail is in my email
Dear Dr. Hassan Raz
I have converted the site to https protocol, therefore, you may try this method.
Very useful command, thanks. Would be helpful to have a help file installed along with the package itself for future reference. I can also confirm this works with the "bysort" command (in my Stata 15), which is exactly what I needed it to be able to do.
Dear, I have a question when using this fillmissing code in stata.
Example:
This command uses the average of the group, but I would like to use the average of the previous variable and the posterior variable to replace the missing, keeping the limits within each group.
Example of the database with missing
Example that I would like to arrive using the fillmissing code
I hope you can help me.
My best regards.
I could not understand the requirements. The data you have posted and the fillmissing command that you have used do not match. Can you please clarify it a bit further on what to use for filling the missing values?
Dear,
I am sorry for the lack of clarity in the explanation.
The original database consists of a panel, with more than 100 importing and 100 exporting countries, organized in pairs. The dependent variable is import flow and the dependent variable is tariff.
The following database is similar to the original
Command:
Result with the above command
This command uses the average of the group, but I would like to use the average of the previous variable and the posterior variable to replace the missing, keeping the limits within each group (BRA USA; USA BRA; and so on).
My expected result would be is to arrive to a base of data similar to the base below:
The asdoc and fillmissing commands are very useful and help a lot in the job.
Excuse me for the inconvenience.
My best regards.
I think it is an interesting problem and will need recursive loops. I have added this option to fillmissing now.
Hello Dr Attaullah Shah;
I want the fillmissing program to solve missing value problems with the with(mean) with panel data.
Thank you for this – it was really helpful!
I am new to stata and want to run interindustry volatility spillover. can you please guide me in this regard?