Get the latest tech news
How S&P is using deep web scraping, ensemble learning and Snowflake architecture to collect 5X more data on SMEs
Previously, S&P only had data on about 2 million SMEs, but its AI-powered RiskGauge platform expanded that to 10 million.
The platform, which went into production in January, is based on a system built by Hadi’s team that pulls firmographic data from unstructured web content, combines it with anonymized third-party datasets, and applies machine learning (ML) and advanced algorithms to generate credit scores. Hadi explained that RiskGauge employs a multi-layer scraping process that pulls various details from a company’s web domain, such as basic ‘contact us’ and landing pages and news-related information. They didn’t want to hard code or incorporate robotic process automation (RPA) into the system because sites vary so widely, Hadi said, and they knew the most important information they needed was in the text.
Or read this on Venture Beat