## Database Systems: Assignment topic proposals Please bear in mind that the assignment you choose to work on should help you to - relate the knowledge gained from the lectures to a specific field of application or use case; - demonstrate your understanding of the topic; - discern the available options and alternatives (as far as they have importance to the topic); - apply available tools & techniques in practice; - show the importance of the topic in your own company (without disclosing private information); - apply lateral thinking to this assignment - i.e., considering different perspectives (e.g. commercial, engineering, cultural etc.). Why might the perspectives differ? Choose from the suggested topics below: ### 1. Describe the **Architecture of an RDBMS** (e.g. MariaDB) in the TOGAF Archimate diagramming language (TOGAF) with the [Archi tool](https://www.archimatetool.com/download/). **Understanding of RDBMS**: How does the chosen RDBMS work? What are its key components and how do they interact with each other? How does it handle data storage, retrieval, process management and user connectivity? **TOGAF and Archimate Knowledge**: Do you understand the TOGAF architectural framework and the Archimate diagramming language? How can these be effectively used to represent the architecture of an RDBMS? Which layers apply for the different aspects of the architecture of an RDBMS? **Use of Archi Tool**: How proficient are you with the Archimate modelling language and the Archi tool? Can you effectively use it to create a detailed and accurate Archimate diagram of the RDBMS architecture? **Use Case**: Can you identify a real-world engineering use case where the chosen RDBMS is used? How would the implicit/explicit options rendered in your architecture model support this use case? ### 2. Describe a **PoC (Proof of Concept) for a NoSQL database system** (e.g. Key-Value, Document, Timeseries, In-memory database etc.). **Understanding of NoSQL Database Systems**: What is a NoSQL database system and how does it differ from traditional RDBMS? What are the unique features and use cases of the NoSQL database type selected for this assignment? **Real Engineering Use Case (PoC)**: Set up a database system in an engineering environment to demonstrate the capabilities of your chosen NoSQL database. What specific functionalities will you implement? How will you ensure that your PoC is realistic and relevant? **Evaluation and Analysis**: How will you evaluate the success of your PoC? What metrics will you use? How will you analyze the results and what conclusions can you draw from them? Examples: - [MongoDB — NoSQL Database](https://medium.com/@nirajan_DataAnalyst/mongodb-nosql-database-5cb0050dc760?source=tag_recommended_feed---------40-84----------mysql----------83c15093_6a23_4439_ada6_c9cef3b85633-------) - [Learn How to Use Chroma DB: A Step-by-Step Guide](https://www.datacamp.com/tutorial/chromadb-tutorial-step-by-step-guide) ### 3. Describe **distributed storage options for databases in the Cloud**. See [Distributed Storage in the Cloud](https://www.brighttalk.com/webcast/18708/520456?utm_source=brighttalk-portal&utm_medium=web&utm_campaign=knowledge-feed&utm_term=channel-feed-result-11) by Peter Zaitsev. Illustrate the benefits based on a business scenario.  **Understanding of Distributed Storage**: What is distributed storage and how does it work in the context of cloud databases? What are the key components of distributed storage and how do they interact with each other? **Types of Cloud Databases**: What are the different types of databases that can be used in the cloud, such as SQL and NoSQL databases¹? How do these databases utilize distributed storage? **Cloud Providers and Their Offerings**: What are the distributed storage options offered by different cloud providers? How do these offerings differ in terms of performance, scalability, reliability, and cost? **Real Engineering Use Case**: Can you identify a real-world engineering use case where distributed storage for databases in the cloud is used? How would the architecture of the distributed storage support this use case? ### 4. Design and implement a **Backup and Restore strategy (Backup Plan) for a DBMS**. **Understanding of Backup and Restore**: What is a backup and restore strategy and why is it important for a DBMS? What are the key components of a backup and restore strategy? **Backup Frequency**: How often should backups be taken? How does the frequency of backups impact the recovery process and the amount of data that could potentially be lost in the event of a failure? **Backup Retention Policy**: What is a backup retention policy and why is it important? How long should backups be kept, and how should this be determined? **Recovery Time Objectives**: What are recovery time objectives and how do they impact the backup and restore strategy? How quickly does the system need to be restored after a failure? **Real Engineering Use Case**: Can you identify a real-world engineering use case where a backup and restore strategy for a DBMS is used? How would the backup and restore strategy support this use case? **Company Background**: How can you correlate your insights to your own company background? Can you describe best practices without disclosing internal information? ### 5. Analyze the **performance of a DBMS** (e.g. MariaDB, PostGreSQL, Cockroach...) and identify performance bottlenecks. **Understanding of Database Performance**: How can you identify improvement opportunities (including indexes, queries, clusters, schemas)?  **Performance Scope and Objectives**: Develop and implement strategies to optimize the performance of the database system. **Performance Tools and Techniques**: What open-source tools are available? How do you qualify them (based on which criteria)? Provide an exemplary utility analysis (NWA). **Real Engineering Use Case**: Illustrate your insights with real-world experiences. ### 6. Design and implement a **Security Policy for a DBMS**. **Understanding of Security Policy**: What is a security policy in the context of a DBMS? Why is it important and what are its key components? Describe authentication & authorization options available in the DBMS. **Threat Modeling**: What are the potential threats to a DBMS? How can a security policy help mitigate these threats? **Access Control**: How will the security policy handle access control? Who will have access to what data and under what circumstances? **Data Protection**: How will the security policy ensure data protection? What measures will be put in place to prevent data breaches and leaks? **Real Engineering Use Case**: Illustrate your insights in a real-world scenario. How would the security policy support this use case? ### 7. **Threat Modeling / Vulnerability Management**. **Understanding of Threat Modeling and Vulnerability Management**: What is threat modeling and vulnerability management? Why are they important in the context of information security? **Identification of Threats and Vulnerabilities**: How will you identify potential threats and vulnerabilities in your chosen DBMS? What tools or methodologies will you use for this purpose? **Mitigation Strategies**: Once threats and vulnerabilities have been identified, how will you mitigate them? What strategies or controls will you put in place to reduce the risk? What can a vulnerability management process look like? **Real Engineering Use Case**: How does your company handle threat scenarios? Can you identify a real-world engineering use case where threat modeling and vulnerability management are applied? Describe best practices (without disclosing internal information). ### 8. **Database Auditing**. **Understanding of Database Auditing**: What is database auditing and why is it important? What are the key components of a database auditing process? **Audit Scope and Objectives**: What are the scope and objectives of the audit? How can a DBMS (e.g. MariaDB) be audited for compliance and regulation purposes? **Audit Tools and Techniques**: What database tools and techniques are available to conduct the audit and to generate reports of activity? How will you ensure that your audit is thorough and accurate? **Real Engineering Use Case**: Can you identify a real-world engineering use case where database auditing is applied? How would the auditing process support this use case? ### 9. Describe and document a **PEN testing** plan for DBMS. **Understanding of Penetration Testing**: What is penetration testing (PEN testing) and why is it important for a DBMS? What are the key components of a PEN testing plan? **Scope of the PEN Test**: What is the scope of the PEN test? What areas does penetration testing cover, in a database environment (and which not)? What are the objectives, and what should be the outcome, of a PEN test? **PEN Testing Techniques**: What techniques will you use to conduct the PEN test? Will you use automated tools, manual techniques, or a combination of both? **Real Engineering Use Case**: Illustrate your insights in a real-world scenario where PEN testing of a DBMS is applied. Illustrate your results based on a standard tool. Can you recommend open-source candidates (based on which criteria)? ### 10. Set up a **Database Cluster** using a specific software solution (such as clustering solutions for MySQL, MariaDB, Cockroach DB, PostgreSQL, Percona) on Containers. **Understanding of Database Clustering and Containers**: What is a database cluster and how does it work? What are containers and why are they useful for setting up a database cluster? **Choice of Software Solution**: Why have you chosen the specific software solution for your database cluster? Describe the configuration options that are available for achieving high availability, scalability, and performance with this solution. **Implementation Details**: How will you set up the database cluster on containers? What steps will you need to take, and what challenges do you anticipate? **Real Engineering Use Case**: Illustrate your findings with your implementation of choice. How would the database cluster support a real-world business use case? ### 11. Describe the **Best Coding Practices for transactional database programming** based on the distributed CockroachDB. Refer to “Enterprise Application Development with CockroachDB” (Parts [1](https://university.cockroachlabs.com/courses/course-v1:crl+client-side-txn-handling+self-paced/about?mkt_tok=MzUwLVFJTi04MjcAAAGQ20visPrk5C6MuDnC-M2QgBDPMTo-JQ2Dt9pGMKNh9u6Nf5MlzeTqmz0L305eekwcslCKQsvdy4xs825i0djTkcMElUPy1qAtUIuWNWkje90), [2](https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffriends.cockroachlabs.com%2FMzUwLVFJTi04MjcAAAGQ20visMBZzU37yvzG1xxfZpr5O6hQDVzJm_yeenIFVQqpid9W1XqS7gk5OXRYBg4z-sqizhM%3D&data=05%7C02%7Cfrank.neubueser%40eviden.com%7C9bb68089938c468c94ed08dc1cfa6a25%7C7d1c77852d8a437db8421ed5d8fbe00a%7C0%7C0%7C638417109050346375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=aRBj57qyKRdWV5rP5lkQM%2FVxEP9F5X2QadGd0ItMaoQ%3D&reserved=0) and [3](https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffriends.cockroachlabs.com%2FMzUwLVFJTi04MjcAAAGQ20visBz6LCJY3OW8cUp7x9iHDSZQ5xDAkfTl-yQd5C1GdCAjjnv0HRAcHwQ4AC8EeJzc988%3D&data=05%7C02%7Cfrank.neubueser%40eviden.com%7C9bb68089938c468c94ed08dc1cfa6a25%7C7d1c77852d8a437db8421ed5d8fbe00a%7C0%7C0%7C638417109050353519%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=21d3PMGWj7UcUy2m2d0cvS3OMXXXwHEVRHAVka31FZE%3D&reserved=0)). **Understanding of Transactional Database Programming**: What is transactional database programming and why is it important? What are the key components of a transaction in a (distributed) relational database? **ACID Properties**: How will you ensure that your database transactions adhere to the ACID (Atomicity, Consistency, Isolation, Durability) properties? What mechanisms or techniques will you use to achieve this? What effect do isolation levels have on application performance, contention and data consistency? **Error Handling**: How will you handle errors and exceptions in your transactional database programming? What strategies will you use to ensure data integrity in the event of a failure? **Real Engineering Use Case**: What system components, languages, setups etc. did you use to execute this training path? Can you identify a real-world engineering use case where transactional database programming is applied? How would the transactional programming support this use case? Give a **critical review** of the practices. Where are the challenges / trade-offs for implementing transaction management in modern distributed database systems? ### 12. **Recursive Common Table Expressions (CTEs)** for data traversal, iteration and generation in SQL. **Understanding of Recursive CTEs**: What are Recursive Common Table Expressions (CTEs) and how do they work in SQL? What are the use cases for Recursive CTEs? **Implementation Details**: How will you implement Recursive CTEs in SQL? What challenges do you anticipate and how will you overcome them? **Real Engineering Use Case**: Can you identify a real-world engineering use case where Recursive CTEs would be beneficial? How would the Recursive CTEs support this use case? Example: [10 SQL Queries with Recursive Common Table Expressions (CTEs) | by SQL Fundamentals | May, 2024 | Medium](https://medium.com/@sqlfundamentals/10-sql-queries-with-recursive-common-table-expressions-ctes-664769ebe16f) ### 13. **Load Testing / Performance Benchmarking with mysqlslap**. Reference: [How To Measure MySQL Query Performance with mysqlslap](https://www.digitalocean.com/community/tutorials/how-to-measure-mysql-query-performance-with-mysqlslap) **Understanding of Load Testing and Performance Benchmarking**: What is load testing and performance benchmarking in the context of a DBMS? Why are they important and what are their key components? **Understanding of mysqlslap**: What is mysqlslap and how does it work? What are the key features of mysqlslap that make it suitable for load testing and performance benchmarking? **Implementation Details**: How will you use mysqlslap to conduct load testing and performance benchmarking? What specific tests will you run and what metrics will you measure? **Real Engineering Use Case**: Can you identify a real-world engineering use case where load testing and performance benchmarking with mysqlslap would be beneficial? How would the use of mysqlslap support this use case?