Background and business need
A US-based leading info publisher who provides data-based insights to their customers needed a database refresh service.
Their database covered more than 250 data points, and company-related information such as employee data, services, products, and affiliates that needed validation.
It was difficult for them to aggregate and validate millions of records and manage voluminous processes in-house. This required an automated approach and they identified RefreshData360 to help them aggregate, refresh, and validate their database.
The solution given by Xtract.io
The data stewards at Xtract.io analyzed the challenges and implemented a step-by-step solution using robust, scalable, and in-house platforms - Worxtream and Mojo.
It was challenging to monitor the data across numerous websites and track the changes.
RefreshData360 brings together its proprietary tools - Worxtream and Mojo to automate intensive data refresh projects. The Web Change Monitoring (WCM) bots monitor and crawl different websites to aggregate the required information in real-time. These bots refresh the databases automatically and update them with relevant data.
It was challenging to validate huge volumes of data and ensure accuracy.
RefreshData360 deploys in-house data curation and validation platform called Mojo that is powered by a human-in-the-loop mechanism. Mojo uses more than 1000 validation rules in real-time and validates the data aggregated by WCM bots for accuracy and authenticity. This helps to identify and flag errors in data, formats, field logics sets for each data point. The platform normalizes and standardizes the data in the prescribed formats.RefreshData360 helped this client reduce their manual rework by 90% and also address the oversight errors.
It was challenging for the client to access the humongous volume of refreshed data
After verifying data at multiple levels, the next challenge was to feed the refreshed data back into the client interface without oversight errors. RefreshData360 uses custom-built APIs and deploys the validated data seamlessly to the client’s interfaces.