To make sure data engineering and analytics projects are successful, not only do you need to pick the right technology and have the right people; you also must have the discipline to apply software engineering best practices. What sort of practices am I talking about?
- Make sure your requirements are clear and communicated to all parties
- Make sure they’re tested before calling it “done”
- Make sure they’re tested on a regular basis
Scenario: Multiple SAP Systems
In the diagram above, let’s say that Alice has been granted access to all the data in all of the SAP systems, Bob can only see data that was generated by business unit 1234, and Lin has not been granted access to any SAP data. And in one scenario we need to build a view that aggregates data across business units to give us a company-wide look at financial results.
Before building anything that combines data from the different systems, we need to think about what Alice, Bob, and Lin will end up seeing if they run a query against the table or view. Will they see the same results, or will they see different results due to their access privileges? There’s no right or wrong answer to that question — it depends on what you’re building. But you need to make sure that everyone is on the same page before starting development — the business analyst, the data engineers, the people doing the testing, and the consumers of the product once it’s completed. If you don’t spell it out, chances are that not everyone is on the same page. For instance, a possible implementation might look something like this:
Clear and Communicated Requirements
How can this situation be avoided? Make sure the requirements of your data engineering project are clear and communicated to all parties. Row-level security should be explicitly addressed in all requirements involving systems like this. They should have either said “Results should be the same regardless of the access level of the user”, or “Results will differ depending on the access level of the user.”
The requirements in the scenario above were to build a view that aggregates data across business units to give us a company-wide look at financial results. That seems to imply that anyone using the view will get the same results. Don’t make assumptions — spell it out. Say that “Everyone using this view will get the same results, regardless of row-level security.” Maybe Dan didn’t realize this when building the combined view — if he had, he wouldn’t have built the solution by reusing the views with RLS.
Test Before Calling it Done
Once requirements are clear and spelled out, you need to make sure that they’re tested before calling it “done.” The first step in that is, who’s doing the testing? Obviously, the data engineer should be testing, but it’s likely that they have a blind spot or two. Did they read the requirements? Did they interpret them correctly? Did they check to ensure all the requisite software engineering best practices were correctly followed?
For complex data engineering and analytics projects, it’s almost always better to have someone else do some testing, in addition to the data engineer, before calling it “done.” And whoever that person is, they’re going to need to have different identities or roles set up to simulate all of the different users we mentioned above — Alice, Bob, and Lin. It’s especially important to be testing with users that have access to most, but not all of the data — partially limited access can lead to slight differences that often get overlooked. You don’t want to be introducing errors like that into your analytics.
Okay, let’s say you’ve spelled out the requirements. And you’ve done the initial testing to make sure all those different identities are working correctly. Awesome! Life is good! What could possibly go wrong?
Scenario: Expose Data Users can Access
Test on a Regular Basis
The final point is to run tests on a regular basis. What once worked is not guaranteed to keep working. What sort of tests and the execution schedule depend on the situation, but some sort of test suite should be run before promoting any changes into your production environment.
Data engineering and analytics projects are complex. Unexpected problems will of course arise; but by following a few straightforward but oft-overlooked software engineering best practices — namely, clarifying requirements and ensuring testing from the outset — you can help ensure that your technology investments ultimately succeed. And if you need help with your data engineering projects, the phData team is here to help! Reach out to us at info@phdata.io to get in touch with our experts.