A well-established giant in B2B commercial data analytics and intelligence corporation wanted to aggregate, enrich, and validate company information and contact data from 350,000 international companies. The data had to be cleaned, enriched, and validated for completeness and accuracy.
The credibility of the source from which the data had to be extracted was to be set by Xtract.io team. Moreover, the data was available in multiple languages spread across various websites and had to be delivered within the stipulated time.
Xtract.io team automated the whole solution with its two AI-powered data platforms - Mojo and Mobito.
Mobito - Source identification and data extraction
Mojo - Created data workflows to translate, profile, validate, enrich, validate, and deliver data
Xtract.io automated the whole extraction process with the help of Mobito - a data crawling platform. Credible and valid sources were identified and crucial information was extracted.
The extracted information was on-boarded into Mojo platform.
Data Translation - Translation bot was summoned to automate translate data to English from multiple international languages like Japanese, Chinese, Spanish, and more.
Data Annotation - Translated data was then labeled and annotated to various predefined categories with the help of Annotation bot.
Data Enrichment - The company and contact data were de-duped and cleansed for any poor data like - incomplete or missing values.
Data Standardization - The enriched values were then standardized to meet International Messaging Format and normalized for consistency.
Human-in-the-loop Quality check - Even though the data was qualified for business use at this level, we found that there were still some quality issues. So we employed human intelligence to check the data for any quality issues and reported it immediately.
Data Delivery - Data was populated into the client’s FTP with the help of custom-built APIs.
Xtract.io was able to deliver high-quality, clean and complete business data and contact information of more than 350,000 companies within 4 months. The human-in-the-loop feedback mechanism implemented by Mojo was able to improve the data quality by 97%.
Data profiling and annotation
Non-English data translation
>350K companies reviewed
Enhanced data quality by 97%