How solid is your data estate?

Time and time again, we see data estates that have been built using outdated patterns that  encounter problems when trying to scale.

In this article, we give you a couple of tips to consider when laying down the foundations of your data estate.

#1: Beware the siren song of ‘drag and drop’ configuration

It can be very tempting to put together a solution using drag and drop tools. They have a low barrier of entry, quick to edit and can produce fast results. However, they also can:

  • Introduce an iceberg effect of making complex tasks appear to be simpler than they are
  • Become difficult to scale with large solutions as they generally favour manual configuration over automation
  • Be a challenge trying to enforce consistency across a solution

#2: DevOps is the path

This has been the gold standard for application development. However, we still see a slow uptake of its use in data estates, which consequently cause manual and error prone deployments that lead to more stress and reduces the pace of change. DevOps has proven to:

  • Reduce development cycles
  • Reduce implementation failure
  • Increase communication and cooperation

#3: Consider Spark

You do not need to have a ‘big data’ workload to benefit from the use of Apache Spark as your data transformation engine. Spark enables:

  • The use of combination ‘set based’ logic (i.e. SQL based queries) with ‘imperative’ logic (e.g. python code). This gives your developers a consistent mechanism to perform any data transformation, despite its complexity
  • The combination of real time and batch transformation using a unified processing engine
  • Close collaboration between your data scientists and data engineers. Historically, they have operated in different toolsets but with Spark, they work together on a common platform

#4: Look towards automation

We created the product ‘LakeFlow’ to help rapidly build resilient data estates. LakeFlow is a data engineering service which will:

  • Deploy a data estate within your Azure environment using only Azure first-party components
  • Generate pipelines and onboard new data sources to your data estate quickly. This allows you to focus on your dashboards and insights
  • Automatically maintain a historical record of your data, in a cost-effective data lake
  • Proactively monitor your pipelines, picking up anomalies in data volume flows before failures occur

If you would like to know more or need assistance in building rock solid data estates, contact us.

Recent Blog Posts


Data is a given…. so is starting with the business problem

Data is a given, so spend time with the business problem you’re trying to solve. Learn more how we collaborated with Real Pet Food Company in this…

Targeting disruptive business opportunities

John Khoury Group CIO at Allied Pinnacle talks about targeting disruptive business opportunities and moving away from long roadmaps.

Taking data to the front line to enable decision making

Sharon Bowman ICT Manager has taken data to the frontline. Hear more about how Hornsby Shire Council is leading the charge on enable the right people…

Legacy systems migration and the security the data estate provides

Legacy system migration, particularly from on-prem has become a lighting-fast reality for many customer check out Matt Clarkin, Business Application…

Embracing the evolution of data channels

Its times like this that make us realise how imperative crisis support is. Check out our interview  with Lifeline’s Head of Architecture and Insights…

The “culture of data” and getting collaborative with digital tools

Getting collaborative with digital tools is the new norm. In this video we sit down with Tim Reid, Head of IT APAC @ Campari and discuss data as a…

Data strategies are changing, watch out for the don’ts

The strategies I have been most proud of were the ones that had a high level of business engagement and there was a compelling business case to do…