NLP for Tracking Media Coverage

How barings Tracks the buzz on companies

  • Barings Asset Management wanted to monitor media coverage of companies in which they invest. 
  • They decided to build an in-house solution. 
  • Named Entity Recognition is the way a machine spots specific names, objects and concepts.
  • Barings used NER to tell apart stories about companies vs. stories about their products.

Asset managers want to stay informed

Investment managers sell their ability to extract value signals from information available. Therefore, asset managers consume large amounts of textual information from different media sources.

But markets are now saturated with text data. From financial news to blogging and tweeting, live-stream texts both discuss and influence asset pricing. Especially during earnings season, when news flow becomes particularly voluminous, ingesting all this information becomes a challenge.

Barings Asset Management sought to develop an internal solution to help consume information at scale and augment the investment research process.

NLP Concept: Named Entity Recognition

Machines break texts apart into smaller particles. These units are called tokens, and the process is called Tokenization. Usually, each word is a token. But, for example, a word like ‘They’re’ is two token: ‘They’ and ‘are’. 

Named entities are tokens (or a series of tokens) representing specific names, objects or concepts. Countries and currency, like ‘France’ or ‘JPY’, are some examples of named entities.

Within financial services, common entities also include companies (‘BP’) and regulators (‘Financial Conduct Authority’) .

Named-Entity Recognition (NER) is the task of automatically spotting when tokens are named entities, and labeling them correctly. Often, clues like uppercase letters, or acronyms (such as ‘FDA’) help to flag a named entity.

Still, company names like ‘adidas’ or ‘3i’ can be challenging for rule-based named-entity recognition. This is one reason domain-specific applications usually require additional expertise, in order to update automated tools with menially labeled entity lists.

Here you can see a visualisation of NER. An open-source NLP software package correctly identifies a name, a date, and an organisation.

Barings built an in-house solution

Beginning in 2018, Barings researchers developed a system that helps digest media content. It identifies the relevant company discussed in a document, tags the document with the correct internal company ID and generates a sentiment score for the content. (More on Sentiment Analysis in the American Century Investments case study.)

Developers discovered financial news coverage is skewed towards a small selection of companies relative to the overall investment universe. On top, they needed to tell apart stories discussing a popular company’s product from stories about the company itself. (e.g. Facebook the platform  vs. Facebook the corporate entity.)

The researchers trained machine-learning models (See ML Concept below) to tell apart the two story types, as only the former was deemed relevant.

Concept: Model Training

Under the hood of almost any piece of software is an excruciatingly-detailed recipe, telling the machine how to treat any input it receives. Unless the recipe itself changes, the software will behave exactly the same over time.

This is in stark contrast to how humans learn to do things. When a person learns to catch a ball, for instance, they do so without first being taught complicated motion equations. Instead, people practice. We often learn best through repeated trial and error.

For highly complicated tasks (face recognition, for example), algorithm developers struggle to specify a well-performing exact recipe. As a field, Machine Learning concerns itself with developing automated solutions that can adapt performance over time.

A Model is a software component. Its performance depends on the value of a set of parameters (from as little as one parameter, to over a billion). Model Training is the process of finding the best values for these parameters, so as to minimize the amount of mistakes the model makes when performing its task.

Training is done by showing the model many data examples. For some (or all) of the data examples, the model also sees the desired outcome. Through a feedback loop, the model adjusts its parameters to improve its success rate.

How finance uses NLP