Chaos to Calm: BI Reporting
Every SQL developer or DBA’s worst nightmare…
“My report is taking ages to load!”
“The figures on my report are wrong!”
“My two reports are showing different revenue figures!”
These were regular comments for the data team at Evolution, from directors and team leaders who were attempting to make business decisions based on outdated, slow performing reports. It led to a reactive team, tending to multiple Helpdesk based questions, as opposed to proactive work like any good development team should strive towards.
This was the chaos before the calm…
The SQL code written to populate and retrieve the data on existing reporting was not particularly bad code. Attempting to add indexes, refactor code and bend the mechanics of SQL Server to improve population and loading speeds was just not going to work. The code was built to serve its purpose over 10 years ago, to a company of no more than 10 consultants, with five KPIs each and maybe a year or two’s worth of data.
Evolution’s rapid upscaling over the years meant that the code was now trying to achieve the same processing and reporting for 100+ consultants, all with 10+ KPIs and 12 years of data. Something drastic needed to change in the code, and it needed to be done quickly in order to provide the business with its core reporting, without which the company would struggle to make key decisions.
The legacy code we ran had multiple issues:
- The reporting being done was reading the data from the highly transactional, data entry style databases that our recruitment, financial and intranet based apps were running on
- Fast flowing data was causing contention and blockages, which consequently slowed both the population of our reports and data retrieval
- SQL Server was unable to allocate the required disk time to run queries, which was also slowing up other processes on the servers
- A delete and rebuild method was being used and due to the time it was taking to “rebuild” the data structures, we were leaving the business without reporting for periods of the day
- Reports were cleansing the data at the presentational/loading layer, as opposed to the population/transformational layers, meaning data retrieval was taking minutes as opposed to milliseconds, due to row by agonising row methods
- 12+ hour populations were leaving the company with out of date KPIs, making it impossible for consultants to track and measure their own progress
- Multiple sources of the truth – reports were being coded away from existing methods, to get around slow running queries leading to differing figures and multiple, hard to maintain SQL scripts
This led to drastic change; a complete re-write of the current ETL methods being used. The task was described as replacing the engine on an aged, slow running and unreliable car. We would worry about the appearance and fancy front-ends later.
After multiple attempts at speeding up the existing code, which would patch up some slower running queries for a short period, the inevitable happened. The reports became unusable, to the point where returning data was taking up to 30 minutes.
It was time for a change… The data needed to be stored in a well-structured, cleansed state, away from the busy transactional databases; a king of truth which everybody understood and could trust. In came BI, a term that is so commonly thrown around in the SQL industry - this was the gateway to our fast reporting to serve the business the data it needed at its finger tips.
Something that the data and development teams at Evolution pride themselves on is the Kanban and Agile methodologies. These processes were absolutely vital to providing continuous integration on the project, delivering the business exactly what it needed in small chunks and, most importantly, in the right order! It allowed us to get value out to stakeholders and consultants fast, with the ability for the users to give us a continuous feedback loop, thus directing the data team on where to develop next.
Replacing the old car’s engine; the details:
- A process was added to extract all of our lovely data from the transactional databases – we wanted to get it away from our busy, fast moving databases
- This process entailed transformation of the data to structure it in a heavily foreign key based static table (Data Warehouse)
- This formed the central hub for all of our valuable data and allowed us to build a star schema, relating to a number of dimensional tables (e.g. employees, teams, KPIs)
- We could then periodically update the Data Warehouse and top it up with any recent activity – a much faster method than the legacy ‘delete and rebuild’ method
- Due to the nature and structure of the Data Warehouse, heavy indexing can be applied to make the data retrieval for the reporting super-fast
- All reports can be built off of the fact table, meaning we achieve one version of the truth
- Audit processes are much simpler for the data team, making Helpdesk requests much easier to deal with
- There is a logging of changes to the data, so we can recreate a previous version of a report and how it looked on that particular day
- Having an organised route into the BI data means we can use the Windows login to identify who is requesting which report, and tailor the returned data accordingly
- There is a log of all login requests and their outcomes, in order to flag up errors and report usage
- 12+ hour populations reduced to less than 15-minute population
- Reports are now loading within seconds as opposed to 30 minutes
The outcome of going through a move to BI methods has made the data team far less reactive and inevitably happier to answer queries on the data. We have restored trust in our reports, and can understand exactly how our figures are produced and maintained. Most importantly, the team rarely receives the dreaded comments mentioned at the beginning, as team leaders and directors have faster loading, regularly updated and trustworthy reports to enjoy using.
We continue to develop and deliver iteratively on the reports, using Kanban to stop starting, and start finishing projects.