Daniel Dass

Department, Institution: Economics, UCL

UBEL Pathway: Economics

Supervisor: Stephen Hansen

Contact details: d.r.dass26@gmail.com

About Me

I have and undergraduate degree from the University of Cambridge and an MSc in Economics from UCL, both in economics. I also have over 5 years of experience as a professional economist. Other than doing a lot of economics, I enjoy being out in nature, dancing like nobodies watching, and playing sport.

My Research

Aim

This PhD project will establish a novel statistical framework for applying natural language processing (NLP) algorithms to text data to measure economic concepts of interest and using the outputs of these algorithms in downstream econometric analysis.

Objectives

Develop a framework for analysing the statistical properties of different NLP approaches, improving researchers’ ability to judge the appropriateness of a method in their context, conduct valid inference (defined as ensuring unbiased and consistent estimation and constructing correct confidence intervals), and communicate their findings to stakeholders.
Apply the framework to a specific research question in economics (to be identified during the first year of the PhD).
Produce a step-by-step process for using this framework.
Create programming packages in R and Python for easy implementation.

Current challenges

NLP methods are used across economics, but they are applied in the absence of any

formal statistical guarantees. There are two key challenges for economists using text data:

Performing valid inference

In general, researchers treat the upstream NPL measurement task and the downstream econometric analysis separately. This creates problems for statistical inference, as the downstream econometric model does not account for uncertainty in the text algorithm output, and it also ignores any statistical relationships between the downstream variables and the text data. Both these factors may add measurement error and create bias in the downstream model.

Choosing the best NLP approach

A researcher has many options when it comes to which method to use to measure their concept of interest. Each method creates different results, which can impact downstream econometric analysis, producing meaningful differences in estimated treatment effects or predictions.

The process for choosing between models in the economic literature is judgement based and is generally not as rigorous as other areas of econometrics. This can lead to inconsistent research, where differences in results between studies are driven by opaque judgements of the analyst rather underlying fundamentals of the research.

Novel contribution

My statistical framework will address both issues. My framework will jointly model both text data and downstream economic variables, providing guidance on what are the conditions for and the threats to consistent and unbiased estimation and how to construct valid confidence intervals. This statistical framework will also help researchers better judge the trade-offs in using one NLP approach compared with another (for example, given a researcher’s study context which approach is most likely to produce a consistent estimate).

Impact of My Research

Impact

NLP methods are used widely across economics, but they are applied in the absence of any formal statistical guarantees. Consequently, economists cannot reliably inform decision makers about the uncertainty, statistical significance or potential biases of their findings.

My research project will address this limitation by establishing a statistical framework for applying natural language processing (NLP) algorithms to text data and using the outputs of these algorithms in downstream econometric analysis. This will enable economic researchers to make the most of the exciting opportunity presented by the rapid growth in text data and accompanying tools of analysis.

My research will help researchers harness text data and have impact in four ways:

Improved decision-making – it will enable valid inference (including consistent and unbiased estimation and accurate confidence intervals) on the results of analysis. In doing so, it will help economists to communicate the findings and limitations of their research to policymakers.
Improved quality – it will improve the credibility and reliability of research, and help researchers understand the trade-offs between methods and make an informed judgement on the best approach to take.
Transparency and consistency – it will provide researchers with a shared framework for choosing between NLP approaches. Currently, the process for choosing between NLP models is judgement-based, introducing bias and leading to inconsistent outputs. My research will bring NLP research in line with the rigour expected in other areas of econometrics.
Accessibility – by providing tools for easy implementation, including programming packages in R and Python, and supporting documents to guide practitioners, my research will be maximally useful to academic, private sector, and government researchers.

Whilst my research will focus on applications in economics, I anticipate the framework will extend to other social science disciplines, furthering its impact.