What is a data mart, and how does it differ from a data warehouse?
Over the last ten years or so, the dizzying increase in the amount of data produced has accelerated the development of Big Data. The worlds of application development and data management have begun to converge.
In this context, knowing how to centralise, structure, process and analyse a mass of data for a specific problem is essential: that's what a data mart is all about. What exactly does this concept cover? And how is it different from a data warehouse? Here are some explanations.
What is a data mart?
Data mart: definition
A data mart, also known as a data shop or data counter, is a specific database intended for a given group of users.
Used in business intelligence, it is extracted from source systems, cleaned up and made available to users in a specific area of the business, or to a restricted group of users.
Data mart example - © Talend
👉 The data mart must serve the end need and therefore transcribe the data initially stored in the data warehouse as intelligibly as possible and as closely as possible to the business language.
Example of a data mart
For example, within a company's HR department, an initial datamart might compile all the indicators relating to the use of the main ERP, while other 'building blocks' of the HR requirement might be datamarts directly associated with very specific secondary applications, such as monitoring employee e-Learning.
Advantages of data marts
- Provides users with a full range of indicators for the data they need on a day-to-day basis.
- The same group of users can have access to a single data mart or to several data marts, each corresponding to a specific need, depending on the IT architectures in place and the confidentiality of the data.
Data mart vs. data warehouse: what are the differences?
Depending on how it is conceived, the data warehouse can be seen as a set of data marts and their gateways or, more commonly, as the centralisation in a single system ensuring the security, availability and technical consistency of all the data used by the data marts.
It therefore takes on a more technical colouring, and will probably not have a single "Sales" field, but perhaps several components of the company's income and expenses, which each area will arrange according to its own conception of sales.
The data warehouse will also make it possible to ensure the traceability of information throughout the company, whereas the data mart is limited to satisfying the specific needs of one business line.
How do you build a data mart? 3 options
The data mart integrated into the source application
If you prefer data marts dedicated to an application, it may be because the application itself offers you integrated analysis tools. This seems like the ideal solution.
Advantage: the application's needs are met as closely as possible, and there is consistency between the data and its output.
Disadvantages:
- costs in the medium and long term, as you have no control over the output of the indicators;
- you are less able to enrich it with the rest of your company's data, and vice versa;
- you may be overlooking options for feeding this data back into the data warehouse.
👉 So you lose in potential what you gain in speed of implementation.
The datamart independent of the datawarehouse
This is a more advanced version of the previous one, since it may have been set up internally, but still from a very specific source on which it is very dependent.
Advantage: you have more latitude when it comes to rendering elements.
Disadvantage: the fact that it is not integrated with the rest of your data warehouse always reduces your potential to respond to user needs in the medium term.
The data mart as a building block of the data warehouse
Data marts should be built around a data warehouse in order to maximise their potential. Their integration can be :
- ️ upwards: a set of data marts enabling the constitution of a datawarehouse,
- ↘️ top-down: the centralisation of data in the data warehouse enables all the necessary building blocks to be created.
Advantages :
- connection with other areas of the business, enabling key indicators of your performance to be refined and explained precisely. For example, you can
- highlight a correlation between falling results on a particular circuit of your e-learning platform and an increase in incidents on a production line.
- optimise your production rate based on an analysis of the pipe in your CRM tool.
- the arrangement of these building blocks within or around a datawarehouse increases your chances of ensuring that your indicators are correctly interpreted for cross-functional use over the long term.
Disadvantage: loss of independence
Which tools for my data marts?
Of course, there is no shortage of ETL tools for processing mass data and analysing it quickly.
But there are also dedicated, open source or proprietary, turnkey storage tools for your data mart.
As with any choice that pits open source against vendor solutions, support and in-house capacity to develop or adapt components will be the criteria to take into account.
From data mart to DataOps
Integrating your data marts into a data warehouse should be a major objective of your architecture. And the proper evolution of this data warehouse is its corollary.
As technical teams are exposed to ever-increasing demands and a growing need for responsiveness, we have had to adapt our development and deployment methods using the continuous integration techniques that have proved their worth in the application world. Data engineering must therefore submit to a new paradigm: DataOps, derived from DevOps.
In short, adapting the principles of DevOps to the world of Data offers a new response to the challenges of setting up data marts in a context of strong growth.