Rebuilding the Big Data processing system
Work executed for a Big Data company
Data Overload
Customer company specialized in aggregating data on every company in the DACH market, using a big data approach to gather information from public and private registers, including company data, shareholder data, annual reports, and financial reports. Aggregated data was used for supporting decisions in lead generation, fraud detection and AML use cases.
As the company grew, the amount of data collected increased exponentially, and expanding to the new markets and areas of interest was blocked by a need for improving the application performance. This has resulted in a complete system redesign.
Overcoming Challenges
The business and development teams head to solve many obstacles to proceed with task:
A continuously running system that needed to be completely rewritten with no regressions
Complex algorithms for data aggregation
Accommodating many different external sources with varying data structure and quality
Exponential growth in required processing power with each new market served
The need to support data received out of order
The team faced the challenge of maintaining a system that worked continuously, while simultaneously delivering updates in a manner that was transparent to the client.
Gradual Overhaul
The first phase of the new system was developed as a parallel copy - processing data at the same time as the original system, allowing for easy comparison.
Followed with agile releases of individual modified processes
Collaboration with data scientists in the preparation and implementation of optimal algorithms for data processing
Working on optimizing database usage - transactions, locks, and proper queries
Reorganizing the system into small, separate microservices using pipes and filters architecture
Using CQRS design pattern to optimize read and write sides separately
Providing the scalability through migration to the cloud
Enabling company growth
Opening up to the new business opportunities
Reducing costs with on‑demand scaling
Business Growth Achieved
Thanks to the incremental approach to the change, it was possible to replace the system transparently to the clients and develop a version that would fulfill all requirements. Agile approach has enabled the team to work on the changes gradually and fix issues as soon as they were detected.
The modified architecture and migration to the cloud resulted in a huge performance boost, allowing the company to grow its business and enter new markets. The collaboration with data scientists not only resulted in the use of proper algorithms but also introduced a new type of data interface, enabling the import of data obtained from additional sources and opening up to the new business opportunities.