showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

Making sense of highly complex data

By Rebecca Tan

SMU Office of Research & Tech Transfer – Researchers from Singapore Management University (SMU) have developed a method that can produce more accurate estimates by incorporating a large number of control variables. Their findings, published in the Journal of Econometrics, were used to estimate intergenerational income elasticity, showing that father’s income had a positive effect on son’s income, but only for certain ethnicities.

“In the case of intergenerational income elasticity, for example, we are asking the question: If the father’s income increases by one percent, what is the percentage change of the son’s income?” said study co-author Assistant Professor Zhang Yichong of the SMU School of Economics.

Although the question is simple to pose, it is difficult to answer, as the treatment (father’s income) has both direct and indirect effects on the outcome (son’s income). For example, a father’s income could plausibly affect his son’s level of education attained, which in turn influences how much the son earns. Thankfully, by measuring the son’s level of education and accounting for it in the model as a covariate, such indirect effects can be cancelled out.

The problem, however, arises when the number of covariates becomes very large or high dimensional. When faced with a large number of covariates, traditional econometric models break down and are unable to produce estimates.

In this study, Professor Zhang and his colleagues Professor Su Liangjun from SMU and Assistant Professor Takuya Ura of the University of California, Davis, developed a three-step method that can deal with high dimensional covariates.

The researchers first tested their model on simulated data before applying it to the 1979 National Longitudinal Survey of Youth, a rich dataset of nearly 10,000 individuals. “We used over a hundred covariates, [which are] essentially demographic variables such as the language spoken at home or whether they lived in the city,” Professor Zhang explained. “We found that the elasticity is positive for Caucasians, but in the African-American subsample, the effect was insignificant.”

Apart from accounting for over a hundred covariates, the model also considers unobserved factors that impact the son’s income, such as leadership and interpersonal skills. It does this using what is called a non-separable model, where unobserved factors affect the outcome in a nonlinear and nonadditive manner.

“In order to eliminate indirect but unobserved factors such as interpersonal skills, we had to rely on the assumption of uncounfoundedness,” Professor Zhang explains. “We plan on relaxing this condition for future research so that we can control for high dimensional data observations without needed to make the assumption of unconfoundedness.”

For more information, please contact:

Goh Lijie (Ms)
Office of Research & Tech Transfer
DID: 6828 9698
Email: ljgoh [at] smu.edu.sg 

Back to Research@SMU Aug 2019 Issue