Tales from the T-SQL Tuesday Archive: #012 Why are DBA Skills Necessary

Posted by John Q. Martin on Tue, Feb 27, 2024
T-SQL Tuesday Logo

For this post I’m diving into the T-SQL Tuesday archives for inspiration, and I’m going to be discussing Paul Randal’s topic of Why DBA Skills are Necessary. Originally, this topic was posted for discussion back in October 2010 and you can find Paul’s summary post with the contributions from the original posters here.

Given that there has been over a decade since the original topic was discussed I thought it would be interesting to look back on how my views have evolved over the years. Back in 2010 I was still very early on in my data career and had not really very aware of the community, I had only attended my first SQL Bits conference in Manchester the year prior.

So, lets start by saying that I think that DBA skills are essential to building, managing, developing or just about any facet of dealing with a data platform today. You’ll note that I say data platform rather than simply database. I make this distinction because while in 2010 we were dealing with very large databases the technology landscape was very different. Azure was around but still very early on it’s lifecycle with SQL Azure Relational Database announced in 2009 (you can check out the history in wikipedia here), so we were predominantly on-premises with SQL Server, Oracle, Sybase, and others. Contrast that with today where we still have the on-premises options but we also have a very rich cloud ecosystem which goes well beyond just the relational. Document databases, Spark, Presto, and many other data management and query engines make up the estates we manage. However, the core DBA skills we used in 2010 still apply today when we interact with these systems.

Thinking about what those DBA skills are, well. While the breadth of technology has expanded and the scope of what a data platform administrator is responsible for has grown, we can still group them into broad categories which underpin our work. These are availability of the data, security of the data, and performance of the workloads.

Availability

Under the availability banner we need the skills to ensure that the data needed by our customers is available to them when they need it. Back in 2010 that often meant understanding how to build and manage SQL Server clusters for me, configuring log shipping or replication for reporting data stores. This also meant looking below the database engine to managing multi-pathing at the network layer, Fibre Channel configuration, disk groups, and RAID configurations within the SAN. Today, these skills are still needed for on-premises and IaaS data platform systems, albeit that SAN management has come on leaps and bounds. When I compare the old EMC Clarrion arrays I worked with in 2011 Vs. the Pure Storage arrays of today the provisioning and complexities of deployment are largely gone.

Contrast that with today where there are many PaaS options available to us which abstract a lot of the lower level complexities away but replace them with a broader spectrum of knowledge required to build and manage a data platform. Deploying and managing an Azure SQL Database is relatively easy for a single region. But, setting up a configuration for multi-region resiliency and plumbing in all of the services around it to make sure that it is where you need it when you need it still has complexities because of the range of services which need to be used. The breadth of knowledge of the DBA has really expanded, not only for needing to understand cloud storage, networking, and identity, but also the range of database engines. it’s not uncommon today to see Database Admins being asked to manage SQL Server, PostgreSQL, Document Databases, and MPP analytics systems like Databricks which is based on Spark.

Availability also covers our old friend backup and restore, which was really an art form back 2010 for Oracle and SQL Server. There were so many options available once we understood the RPO and RTO (if we managed to get them from the business). Being able to recover databases was something that I had plenty of practice on in my time, and luckily only had three occasions when I needed to recover failed systems.

Looking at the options now, most of this is built-in when it comes to cloud based-systems and the SLA’s provided by the big cloud providers are set and clear for people to see. Whether it is Azure SQL Database, Amazon RDS, or the multitude of blob storage which underpins the likes of Databricks or Microsoft Fabric. I have to say that this side of the database administration is something that I see as having really benefited from cloud because of the way that the business clearly understands what the SLAs are.

Security

Security is an interesting topic. Yes the DBA is responsible for the security of the database systems, but ultimately it is a collaborative responsibility between the DBA and the Developer. With the Developer being responsible for the security model within the database and the DBA for the infrastructure. I remember when there were many heated discussions about the DBA saying no and being the gatekeeper. In my experience that part of the role has evolved and we have achieved that collaborative state which we should be in.

The range of options for securing systems has not really changes that much in that we can use an external authentication provider such as Active Directory or IAM, or one which is part of the database platform. I remember the perceived insecurity of SQL authentication compared to AD authentication, when in reality the insecurity was not with SQL Server but the insecure practices of storing and using the username and password for applications. One system I worked with had the username and password embedded in the page source for a web page… With the advent and development of secret management systems which can be integrated with applications this problem has largely gone away and it becomes a choice of which option has the lower management overhead.

I would talk about advances for encryption of data via Always Encrypted or similar, but that is where the collaborative aspect kicks in and quite frankly that is the developer choice to implement and use. Much like the use of roles, views, functions, procedures, etc. within the database. Ideally we as the DBA just need to drop an account into a role and walk away.

Performance

Performance is an interesting topic, this is the aspect which has changed the least over the years in my view. We will still get told the system is slow, be given vague information, and told the make it faster. We still have to follow the same process of troubleshooting, the only real thing that has changed with the move away from on-premises systems is that we now have a lot more instrumentation and telemetry built into the platforms so we actually have historical performance data to look at. Whereas previously we had to roll our own or buy something like Redgate SQL Monitor.

Ultimately it is still a case of finding the bottleneck out of Compute, Memory, Storage, or Network and then figuring out how to optimise it so the bottleneck moves but an acceptable level of performance is achieved. The rise of MPP systems over time like Hadoop, Spark, Redshift, or Dedicated SQL Pools has meant that query performance tuning has got a little more complex because of needing to understand how wide Vs narrow operations can result in data movement between processing nodes. But again, it is still a resource consumption Vs availability question.

One aspect of performance which has really started to come to the fore is that of cost. In a world where we are charged by the CPU on premises, or by the CPU time in the cloud it is paramount that our systems are efficient. I recently read the book “End of Abundance in Tech” by Ben DeBow which really dives into how workload efficiency can have a tangible impact on the cost of our data platforms when everything is chargeable. As someone building and managing data platforms if I can tune the workload so that it costs less then I can clearly demonstrate to my customer or employer of the impact I am having in a way they will clearly understand.

Summary

Thanks for sticking with me on this topic, it’s something that I can discuss at length as I have a lot of thoughts on the topic.

I firmly believe that a good DBA is worth their weight in gold, and that the key skills around making sure that systems are available, secure, and perform should be part of every data professional’s toolbox. I’m really interested to see how the cost optimisation side of data platform administration grows, because it’s not cheap these days and we can’t simply throw hardware at a problem if we don’t have any hardware.

I would also encourage anyone working with data that they expand these core skill areas. Whether you are a full stack developer, or a low-code/no-code developer, these will help you build and manage better outcomes and deliver value for the stakeholders who buy or use your work.



comments powered by Disqus