Business Insider

Cleanlab Announces New Data-Centric AI Software to Reinvent Data Quality and Data Science

Cleanlab, a recognized leader in AI software to automatically ensure high-quality data, is announcing new capabilities of their data-centric AI platform. These innovations build on the pioneering data correction technology invented by the Cleanlab founders as PhD students at MIT.

The latest release of the Cleanlab platform aims to fundamentally transform how enterprises approach critical processes like data curation, data annotation, diagnosing/correcting problems in a dataset, and deploying models for AI or Analytics.

“Data is the currency of enterprise AI, but real-world datasets inevitably contain quality issues that undermine its value,” said Jonas Mueller, PhD, Chief Scientist at Cleanlab. “We completely reimagined a modern AI platform that embeds automated data curation directly within modeling workflows. As you’re building AI for your dataset, our platform continuously alerts you of potential data problems and suggests fixes.” 

This unified approach allows Cleanlab’s data-centric AI engine to continuously improve dataset quality and machine learning reliability in tandem. Proprietary algorithms assess the information in individual data points, estimating: which data can be confidently auto-labeled by AI, which data would be most informative to collect additional annotations for, and which data has issues. 

The platform automatically detects: mis-labeling/tagging and data entry errors, outliers, near duplicates, distributional drift, unsafe or low-quality content, and other data problems that plague most enterprise datasets. An intuitive interface enables rapid data improvement at scale, such that a lone data scientist can fix millions of data points. Companies use these data curation capabilities to produce more reliable AI or Analytics in less time/cost, and to serve better information in data presented to customers (for instance, in a product catalog). 

“Rather than manage data issues in a separate environment, we deeply integrated data quality workflows into machine learning development,” Mueller said. “That’s because we use AI to automatically discover what data needs fixing and how to fix it. So the AI development helps improve your data, which in turn improves the AI development, in a virtuous cycle.”  

Unlike other tools specialized for one type of data, the latest Cleanlab release works across structured tabular datasets and unstructured image/text data. The software compliments rules- based data quality tools by using AI to automatically uncover issues teams failed to specify rules for. It also goes beyond traditional data cleaning to modify information without altering underlying structure or schemas.

The same cutting-edge machine learning used to detect data issues and auto-label data is also available for enterprises to deploy in business applications. New capabilities enable single-click model retraining and deployment after data improvements, enabling users to immediately benefit from the highest-quality dataset. All of Cleanlab’s machine learning hinges on carefully calibrated confidence scores, for instance a Large Language Model that estimates its own uncertainty about responses to mitigate the hallucination problems that plague LLMs like ChatGPT.

“Instead of moving data through various siloed products, teams can now progress raw data to reliable AI rapidly within Cleanlab’s unified no-code platform,” Mueller said. “We redesigned the entire stack to create the fastest path from data to value.” Cleanlab’s data-centric AI platform and the latest upgrades are generally available today. Learn more at cleanlab.ai.

About Cleanlab

Founded in 2021, Cleanlab pioneers data-centric AI to deliver next-generation data quality and machine learning solutions. Their products help Fortune 500 companies across industries including technology, e-commerce, consulting, finance, law, and more achieve reliable, trustworthy AI at scale. Learn more at www.cleanlab.ai.

Media Contact

Organization: Cleanlab

Contact Person: Jonas Mueller

Website: https://cleanlab.ai/

Email: team@cleanlab.ai

City: San Francisco

State: California

Country: United States

Release Id: 2612238534