Our client, a foremost public sector authority in the UK, wanted to gather pricing data of more than 20,000 store outlets across the UK. They intended to gather huge volumes of pricing data of numerous products and service categories on a daily basis. They wanted this data to calculate the consumer price index.
The company wanted to help the government make policies and regulations based on the consumer price index. They wanted to gather price data in an automated manner to predict the changes in the market economy and citizen spending.
As the required data was spread across disparate sources, our customer wanted an automated solution that needs very less human intervention for its functioning. Additionally, they wanted to aggregate multiple data attributes from each product category. They sought to obtain data that is reliable and precise since the consumer price index should be quite accurate. Therefore, they needed a solution that has data anomaly detection embedded in it for utmost accuracy.
Challenges our client faced
Our client used manual data extraction and integration before, which cost them both money and effort. Human errors in data entries affected the calculation and detecting anomalies was extremely time consuming.
With price varying every few seconds, they needed to update their pricing dataset frequently for the calculation process. Without any technical back up they found it difficult to gather data in the right time.
How we solved the problem?
Xtract.io developed site specific custom bots to gather price data from retail sites. They were scripted in perl and python for collecting and presenting data in a convenient manner. For automating the whole data extraction process, we used selenium based browser automation.
After data extraction, we collected the data in a database for data deduplication. The raw data after data deduplication goes through two levels of quality check for ensuring the quality and accuracy of data. First level quality check is an ML based anomaly detection process and second level is a manual random sampling data validation process. The end result is a targeted dataset with the most recent and relevant data.
We performed data extraction and integration all throughout the year, even on holidays for a 360 degree coverage of all pricing information from the targeted outlets. We employed dedicated technical support and human expertise for ensuring smooth functioning of the bots.
We gathered one million records every day that covered diverse products and services like aviation and clothing. The data collected was transformed into client supported format for better ease of use.
Xtract.io became the only offshore data partner of this prestigious organization. Using our cutting edge data handling approach, they calculated the consumer price index with pinpoint precision. We updated data in real-time which helped them reflect the price changes in their calculations.
Xtract.io provided quality data with more than 98% accuracy using our ML based approach. With fully automated customized bots, our client could extract the exact data points they needed from a huge chunk of data. Time spent on data gathering previously, was reduced to 1/3rd using our technical solutions.