ING is a pioneer in digital banking and on the forefront of being one of the most innovative banks in the world. The Data Analytics Platform team, led by Bolke de Bruin, enables last mile analytics for ING's analysts and data scientists who need to access data from various silos within ING to produce insights and models for various use-cases in anti-money laundering, Know Your Customer, transaction monitoring, etc.
- ING's key challenge for data analysts and data scientists was that data was scattered around various opaque silos.
- ING explored a few open-source and commercial solutions for their Data Analytics Platform before landing on Amundsen. They chose Amundsen because of its reliance on automation for understanding and trusting data and integrations with the modern data stack.
- Amundsen has enabled direct communication channels between data users and data owners.
- Amundsen is widely used at ING, by 700+ data scientists and data analysts. Adoption is growing by ~10% every month.
Data spread across opaque silos leading to redundant work and wasted time among data scientists and data analysts
ING's Data analysts and Data Scientists were constantly creating new datasets but their peers didn't know about the existence of new data sets or if they could be used for their use-case. Additionally, it was common for the data they needed to already exist in a different organizational unit that they weren't aware of. Consequently, data analysts and data scientists ended up recreating the data from source which was wasted effort but even worse was the fact that they wait for the source data to be ingested into their environment, leading to several months of wasted waiting time.
Obtaining data within the company and bringing that into ING's centralized data lake could take several months. More than 80% of the wall-clock time was spent in just waiting for data to be available in the right environment and then cleaning it. And all this before the user had a chance to even use the data!
Bolke de Bruin, VP of Engineering Advanced Analytics / Artificial Intelligence, explained
"People were constantly creating new data sets and the only way at that time, was to manage trust was through curation - rubber stamping that this data was good or not good. This doesn’t scale. We had to do two things - automate trust and federate communication."
Self-serve data discovery with Amundsen
ING's Data Analytics Platform team tried a few different solutions. A part of the company was already using Apache Atlas. However, it was missing some key metadata attributes and user experience was non-intuitive. Data users needed to know where the data was coming from, what it contained, who was using it and how, leading data analysts and data scientists to not adopting Apache Atlas.
The team also looked at a few proprietary solutions but they were always falling behind in integrations and other product innovation. They also couldn't ingest ING specific metadata into them.
de Bruin shared
"We chose Amundsen for data discovery for ING's Data Analytics Platform because it relied on automation to help data analysts and data scientists discover, understand and trust the data. We really liked that Amundsen integrates with the modern data ecosystem and how it has enabled us to open direct communication channels between the analysts and data scientists who use the data and the data owners who produce the data."
Amundsen and the Data Analytics Platform has led to a flywheel effect at ING. An analyst creates a data set, which gets discovered and used by another Data Scientist, who then creates a model which gets fed into the application, which produces more data, discoverable to others in Amundsen again.
Amundsen has enabled users to understand what data is ok to use for what purpose. The team at ING is in the process of connecting Amundsen with its policy engine. Once complete, users will be able to discover data sets and request access to data within Amundsen. Requesting access would trigger an existing internal workflow for granting access, Amundsen would become the entry point for such requests for analysts and data scientists.
de Bruin shared an example where Amundsen had a profound impact at ING. One team member at ING wrote a Spark transformer that transformed SWIFT messages from the SWIFT financial transaction system into tables. These messages are really hard to parse correctly and have wide-spread use at a financial institution like ING. Analysts and data scientists were able to discover, understand and use these transformed messages on their own for various diverse use-cases in Know Your Customer, anti-money laundering and transaction monitoring. The burden of evangelizing is significantly reduced by the self-serve discovery functionality Amundsen provides out of the box.
ING has also integrated its internal statistical monitoring framework, popmon with Amundsen which has enabled users to find and report issues related to data sooner and for data owners to get notified of and resolve such issues much faster than before.
Enabling data mesh within ING, Amundsen is used by 700+ data analysts and data scientists
Amundsen has enabled data analysts and data scientists to understand what data exists, what it means, how it's generated, how it's being used and by whom. It has enabled these users to re-use existing work rather than having to reinvent the wheel every time they build new insights or models.
Amundsen has enabled data mesh in the company by creating a communication channel between the analysts and data scientists who use the data and the data owners who produce the data.
de Bruin elaborated "The Data Analytics Platform of which Amundsen is a key part of is quoted as the most user friendly platform at ING. It has really changed the paradigm for analysts and data scientists at ING. It started as a grassroots initiative but has become a global standard."
Sharing the impact and adoption of Amundsen at ING, de Bruin mentioned