NLP for Better ESG Investing

How Deutsche Bank weeds out greenwashing

  • Deutsche Bank wanted to screen for companies that live up to their ESG commitments. 
  • Large companies’ higher ESG ratings is possibly linked to lengthier sustainability reports
  • Topic Modeling is a useful NLP concept, linking the words in the text and its topic.
  • The bank used Topic Modelling to spot companies likely to achieve emissions reductions.

Not all ESG data is gold

Demand to factor Environmental, Social and Governance (ESG) data into investment decisions has been growing, from both large pension funds and individual investors. Some evidence also suggests this added consideration helps deliver superior portfolio performance.

As the figure below shows, the vast majority of S&P 500 companies now publish sustainability reports. However, translating non-financial ESG information into actionable data can be tricky.

A big part of the problem is ‘greenwashing’, whereby companies file voluminous disclosures related to sustainability, which are ultimately opaque and meaningless.

Deutsche Bank found larger-cap companies tended to receive overall higher ESG ratings, possibly because large firms have greater resources to write lengthy reports.

The bank decided to develop alternative ways to analyze sustainability reports, in order to gauge if companies are truly aligning their business with sustainable practices.

NLP Concept: Topic Modeling

A text’s topic is thought of as the key idea the text expresses. While certain words are likely to appear in texts irrespective of their topic (for example, ‘and’), other words are likely to relate to some ideas more than to others. ‘Coffee’, for example, more commonly relates to ‘food’ than it does to ‘banking’.

The set of problems dealing with linking the topics of texts with the keywords associated with those topics is known as Topic Modeling.

One such problem is identifying groups of keywords that commonly appear together, to assess the topics of a set of documents.

For example, if analysis shows the words ‘cost’, emissions’, ‘economy’, ‘economic’ and ‘industry’ commonly appear together in a group of articles, one might infer these articles are all discussing economic impacts relating to climate change. A different problem is identifying which articles cover specific topics. 

In this case, Deutsche Bank’s researchers analysed a set of sustainability reports to identify keywords that commonly appear in sections with the above five topics. These keywords then help flag similar sections in many other reports.

As a result, researchers were able to spot whether companies’ sustainability reports had sections focusing primarily on the topics of ‘mitigation’ and ‘adaptation’, and compare their success at achieving emissions reductions goals.

Topic modeling helps detect greenwashing

Deutsche Bank decided to investigate whether the commitments firms make to reducing carbon emissions were associated with achieved sustainability performance.

Analyzing carbon-related discussions within the reports, researchers used topic modeling and identified five different topics, along with the top keywords associated with each topic. The table below shows the topics and the top associated keywords.

Companies were ranked based on their focus on the mitigation and adaptation topics. The system also scanned for numbers and quantitative words (like ‘first’ and ‘half’), and for active (vs. passive) language.

The bank found that companies using highly active and numeric language have, on average, a 74% chance of reducing their future emissions. Also, companies that frequently discuss mitigating or adapting to climate change have a 65% higher chance of achieving reductions.

How finance uses NLP