Get the latest tech news

How S&P is using deep web scraping, ensemble learning and Snowflake architecture to collect 5X more data on SMEs


Previously, S&P only had data on about 2 million SMEs, but its AI-powered RiskGauge platform expanded that to 10 million.

The platform, which went into production in January, is based on a system built by Hadi’s team that pulls firmographic data from unstructured web content, combines it with anonymized third-party datasets, and applies machine learning (ML) and advanced algorithms to generate credit scores. Hadi explained that RiskGauge employs a multi-layer scraping process that pulls various details from a company’s web domain, such as basic ‘contact us’ and landing pages and news-related information. They didn’t want to hard code or incorporate robotic process automation (RPA) into the system because sites vary so widely, Hadi said, and they knew the most important information they needed was in the text.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of SMEs

SMEs

Photo of Snowflake

Snowflake

Photo of S&P

S&P

Related news:

News photo

Snowflake to buy Crunchy Data for $250M

News photo

Bridged Indexes in OrioleDB: architecture, internals and everyday use?

News photo

How Snowflake’s open-source text-to-SQL and Arctic inference models solve enterprise AI’s two biggest deployment headaches