Political Economy of Sustainable Finance Reporting and Data

The integration of sustainability concerns into financial products and financial regulation is dependent on data. Without underlying data neither “Net-Zero transition” Exchange Traded Funds nor central bank stress tests of climate risks could exist.
However, sustainability data is not simply “out there” and readily available. Instead, it is shaped amongst other by historical legacies and biases, market dynamics, and regulation. This project investigates conceptually and empirically how these and other factors constitute a Political Economy of Sustainable Finance Data. This political economy conditions which sustainability aspects financial institutions can assess and govern and which ones remain unseen.

Contributions to this project deal amongst other with:

  • Mergers and acquisition in the sustainable finance data market showing geographical imbalances and concentration effects
  • The lasting impact of regulatory choices and early market practices for establishing “corporate-centric” sustainability data as the norm at the expense of geographic data aggregations
  • The assessment of regulatory debates about scope and format of sustainability reporting from a data quality and data utility perspective
  • The mapping of data sources and data transformations that occur at various stages in the context of sustainable finance data as well as of the organizations involved in these
  • A critical appraisal of indicators, thresholds, aggregation methods and other data transformation methods that are employed to evaluate sustainability risks and impacts
  • The creation of gold standard benchmark data for sustainability reporting as well as of alternative experimental datasets and indicators that are spatially embedded
  • The automatized extraction of georeferenced asset-level information from corporate reporting through Natural Language Processing

Mergers and acquisition in the sustainable finance data market showing geographical imbalances and concentration effects

Data on sustainability-related risks and impacts of companies comes from various sources including corporate reports, web scraping and satellite images.These different data sources are compiled, structured and aggregated by data vendors specializing in Environmental, Social and Governance (ESG) information and then sold to data users including financial institutions, regulators and academics. The first specialized data vendors were small firms from different backgrounds (news organizations, religious investors, …) that often only covered one country and focused on a particular topics. Through several waves of mergers and acquisitions the market for ESG information has, however, consolidated. As illustrated in the visualization of the ESG firms dataset (ESGfida), large global (though often US-based) vendors of financial data like MSCI, Moody’s or S&P have acquired many smaller firms. From a data perspective knowing about these market developments is relevant as knowing about the business model, strategy and clients of data vendors provides context for interpreting why certain measurements and aggregate metrics were developed while others are discontinued.

The lasting impact of regulatory choices and early market practices for establishing “corporate-centric” sustainability data as the norm at the expense of geographic data aggregations

Early voluntary initiatives that sought to promote and standardize companies’ sustainability reporting sought to emulate traditional financial reporting in order to increase the familiarity and perceived relevance of the disclosed information. Regulations making corporate sustainability reporting mandatory continued with this practice and likewise emphasized financial institutions as the main users. These developments have resulted in a “corporate-centric” way of reporting that takes the legal and organizational structure of the (consolidated) company as its main unit of analysis. In the context of sustainability, this focus creates, however, conceptual and practical problems. Among these are potentially misleading aggregations as well as the inability to link and verify disclosed information. In addition, stakeholders that have a critical role in the transition towards a sustainable economy in a specific geographical space (e.g. ministries, municipalities, regional planning bureaus etc.) are precluded from using the data. In light of these shortcomings, we outline policy recommendations that put greater emphasis on  georeferenced disclosures in the EU regulatory framework.

The automatized extraction of georeferenced asset-level information from corporate reporting through Natural Language Processing

Even though there is no mandatory reporting of georeferenced sustainability information at scale, companies may still disclose data concerning a particular location in other contexts (e.g. materiality and risk assessments, acquisitions, divestments, capital expenditure, new technologies). These disclosures are, however, “hidden” in unstructured texts, tables and graphs. To get an idea in which contexts and how often companies talk about their geographically specific assets in reporting, we apply a Named Entity Recognition pipeline with a dependency parser to corporate reports in PDF format. The pipeline extracts sentences that combine mentions of Geopolitical Entities (cities, regions, nation-states) with mentions of customizable asset keywords (e.g. plant, factory, office). 

Further analyzing these extracted asset mentions allows researching an array of questions including whether company and report attributes influence the frequency of geographic disclosures, what geographies are most prevalent for a given sample of companies and which topics are discussed frequently in the context of individual assets. Moreover, by linking the disclosed geoinformation with other spatial datasets one can investigate how complete  corporate disclosures and to what extent they feature omissions and biases. Moreover, one can assess the geographical alignment of corporate disclosures on sustainability-related topics and academic sources such as the planetary materiality framework

Information extraction from and data quality issues in corporate sustainability reporting

In the absence of comprehensive registers on companies’ environmental impacts, compiling datasets from unstructured corporate sustainability reports has emerged as a second-best solution. This information extraction task can be achieved faster and at greater scale by deploying Natural Language Processing (NLP) techniques including the integration of Large Language Model (LLM) into information retrieval systems.

When deploying such systems to generate datasets one must, however, take two dimensions of data quality into account. Firstly, one has to check how good the information extraction system is in finding relevant data from the documents and excluding irrelevant values. We study this dimension in the context of Greenhouse Gas reporting from companies by comparing LLM extracted values against a benchmark dataset that was created by human experts in several rounds. Secondly, one has to check how accurate the values disclosed in the reports are in measuring the respective environmental aspect. This step becomes necessary as experiences and efforts deployed to disclose data like Greenhouse Gas emissions vary widely between companies. To get an idea about the quality of the reported value, we propose a typology of 30 interrelated errors and issues of emissions reporting (see figure) that can be inferred from the contextual information in the reports.