Strategies

NLP Signal Extraction

Sentiment analysis · Linguistic divergence · Information density

Text as alternative data

The conventional framing treats corporate communications as compliance artifacts — necessary but uninformative. We treat them as the richest source of alternative data in public markets. Corporate text contains thousands of sentences crafted by management, reviewed by counsel, and verified for accuracy. Changes in this language — however subtle — reflect real shifts in business fundamentals, risk perception, and strategic direction. Our NLP systems are purpose-built to detect these shifts at scale.

Linguistic divergence scoring

For every company, we compute a linguistic divergence score between consecutive reporting periods. This measures how much the language has changed across data sources, controlling for boilerplate updates and standard revisions. A high divergence score in risk-related content, combined with low divergence in business descriptions, often indicates a company facing new challenges that it has not yet communicated through forward guidance. These signals precede earnings surprises with statistically significant frequency.

Information density and sentiment

Not all data carries equal information content. Our information density model measures the ratio of novel, specific language to standard templates. High-density content receives greater weight in our scoring engine. Separately, our sentiment models are calibrated specifically for financial language — trained to distinguish between genuinely cautious risk language and standard hedging that appears regardless of a company's actual risk profile.