Ask Our Experts Series: Dive Into Databases with Zachary Fowler
Welcome to Cloud303's first installment of our "Ask Our Experts" series! Each week, we'll connect you with a leading expert in a key tech field to get their insights, advice, and experience. This week, Cloud303's Solutions Architect Jonathan Fernandez interviews Senior Solutions Architect Zachary Fowler, a veteran in IT with decades of experience under his belt. His extensive background and specialization in databases have led him to great heights, including the notable accomplishment of founding HarperDB, a cutting-edge database management platform.
Now let's dive in!
Q&A with Zachary Fowler
Q: What is your overall approach when it comes to understanding databases?
A: Databases should not be overly differentiated from other IT services. Like any back-end or front-end service, the system's operational capacity is largely determined by its size, volume, and the resources it calls upon to retrieve information. These are bi-directional factors that affect both input and output.
Q: What are key factors we should consider regarding data exchange with the database?
A: It's crucial to understand how large and how frequent data exchanges are. Unless dealing with IoT or high-transaction systems, the focus should be on how many connections the system can manage. This often depends on how much RAM the system can allocate towards managing connections, doing the work, and returning a response.
Q: What role do data models and application request methods play?
A: Once a request enters the system, it's time to dig into the nitty-gritty of database configuration. Considerations include how the data model and application request methods are utilized, and the usage of sub-selects, joins, and WHERE conditions. Explains on long-running queries can help determine if the initial where condition is eliminating the most rows possible, optimizing system performance.
Q: What are some considerations for ensuring efficient data scans?
A: Indexing columns that are used frequently is a best practice to help scan large tables efficiently. However, be aware of the limitations. If a user is executing an ad hoc query, an unindexed value can use up considerable resources, especially if there are many rows of data.
Q: What are your thoughts on No-SQL databases and other emerging technologies?
A: No-SQL databases are excellent for data display on front-ends and smaller applications, though they currently don't quite compete with the capabilities of Relational DBs. Other notable technologies are Columnar Stores that allow for efficient scans on columns. MongoDB, while useful, has its limitations with the number of collections and the searchability of nested objects.
Q: What challenges do you frequently see with distributed databases?
A: Keeping databases in sync is a major challenge in the industry. Existing tools and centralized sync locations can help, but ensuring small latency in syncing is vital. High availability and redundancy are similar challenges. Managed services usually handle these issues, but in other cases, mechanisms need to be set up to switch traffic to a secondary database when necessary.
Q: Can you share some of the cutting-edge solutions you're exploring?
A: I have been working on a solution that uses a mesh network to broadcast a request to multiple systems, which alleviates the need for databases to sync. Additionally, I see potential in Non-Volatile Memory (like NVME drives) as persistent storage, which offers tremendous speed for handling vast amounts of data.
Q: How important is understanding SQL and different data models in the industry?
A: SQL is a fundamental language in database management, and while it can be difficult to learn, understanding the basics is essential. However, learning other people's data models can be challenging unless you have a lot of time. Each industry has its own set of naming conventions that can take time to fully comprehend.
Q: What's your take on the AWS Database Specialty certification?
A: Much of the AWS Database Specialty certification is based on understanding their pricing and the limitations of each system in terms of high availability, disaster recovery, and failover times. They aren't testing you so much on databases as they are on the capabilities and pricing of their services.
Q: Can you touch a bit on the realm of data science and analytics?
A: Data science and analytics is a world of its own when dealing with large amounts of data. Techniques like MapReduce are used to distribute queries to do work on a subset of the data and return it back to a coordinator. However, they usually involve distributed databases requiring a central data lake or warehouse to ask questions about many different data.
Databases are an integral part of IT infrastructure, and understanding their mechanics can make a massive difference in your operations. Considerations range from the size and frequency of data exchanges to how your data models and request methods are configured. Emerging technologies provide new opportunities, but also new challenges to tackle.
Glossary and Jargon Explainer:
IoT: Internet of Things. It refers to the billions of physical devices connected to the internet, all collecting and sharing data.
RAM: Random Access Memory. It's the main memory of the computer where the operating system, applications, and data in current use are kept so they can be quickly reached by the device's processor.
Indexing: A database optimization technique that speeds up the data retrieval process.
No-SQL: It's a type of database that's designed to handle data management tasks on a larger scale than traditional relational databases.
MongoDB: A source-available cross-platform document-oriented database program. It's classified as a NoSQL database program.
Columnar Stores: It's a type of database management system (DBMS) that stores data by columns rather than by rows. It can speed up the read time for database queries.
Stay tuned for the next episode in our "Ask Our Experts" series. We'd love to know which topics you want us to cover in future posts. Please participate in our weekly poll or leave your suggestions in the comments below. Remember to share this post with your peers to spread the knowledge!