Advanced DBMS

Advanced DBMS( DBMS) relates to a set of technologies and generalities that make upon the basics of traditional DBMS to handle complex and demanding conditions of ultramodern operations and associations. These systems give advanced features and capabilities to manage data more efficiently, ensure high vacuity, and support complex queries and analytics. Then are some crucial aspects of advanced DBMS

ADVANCED DBMS
ADVANCED DBMS

Transaction Processing Concepts

Transaction processing is a fundamental concept in database management systems (DBMS) that ensures the reliability, consistency, and integrity of data operations. It involves grouping database operations into logical units called transactions, which are executed as indivisible entities. The primary goal of transaction processing is to maintain the ACID properties, which stand for Atomicity, Consistency, Isolation, and Durability. These concepts in more detail:

  1. Atomicity.
  2. Consistency.
  3. Isolation.
  4. Durability.

Transaction States

During a transaction, it has to go through many states. These states update the operating system with the current state of the transaction and tell the user how to schedule further processing of the transaction. These states define the rules that determine the fate of a transaction, whether committed or aborted.

The ROLLBACK statement cancels changes made by the current transaction. After a COMMIT is performed, the transaction cannot undo the changes.

The different types of transaction states are:

1. Active State
2. Partially commited
3. Failed State
4. Aborted State
5. Committed State
6. Terminated State

Serializability

Schedule serialization ensures that non-sequential schedules are equivalent to sequential schedules. This helps execute transactions concurrently without interleaving with each other. In simple terms, serializability is a way to determine whether the execution of two or more transactions maintains database consistency.

Testing of serializability

Serializability is a property that ensures the correctness and consistency of concurrent transactions. It guarantees that the final outcome of executing concurrent transactions is equivalent to executing them in some sequential order.

To test for serializability in an ADBMS, you can use the following methods:

  1. Schedule-based testing.
  2. Conflict serializability testing.
  3. Precedence graph testing.

Conflict Serializability

serializability refers to the property of a schedule, which is a sequence of operations executed by concurrent transactions, to produce the same result as if the transactions were executed serially (one after the other) in some order. When a schedule is serializable, it means that the outcome of the concurrent execution is equivalent to some serial execution of the same transactions.

A conflict in the context of serializability occurs when two or more transactions access the same data item, and at least one of them performs a write operation. There are two types of conflicts: read-write conflicts and write-write conflicts.

Now, let’s focus on the specific term you mentioned, “adms.” Unfortunately, I couldn’t find any specific meaning or acronym associated with “adms” in the context of serializability or database systems. It’s possible that it refers to a specific database management system or a concept that is not widely recognized.

view serializable schedule

A view serializable schedule refers to a schedule that is equivalent to a serial execution of transactions while preserving the original transaction dependencies. ADMS (Active Database Management System) is an extension of a DBMS that includes active functionality, such as triggers and rules, to automate certain actions based on database events.

To view a serializable schedule in an ADMS, you typically follow these steps:

  1. Identify the schedule.
  2. Construct the transaction precedence graph.
  3. Check for cycles in the graph.
  4. Determine the conflict serializability.
  5. Validate the view serializability.

Recoverability

recoverability refers to the system’s ability to recover from failures and disruptions effectively. ADMS is a software platform used by electric utility companies to monitor, control, and optimize their distribution networks.

Recoverability in ADMS typically encompasses several aspects:

  1. Fault tolerance.
  2. Data backup and restoration.
  3. System recovery.
  4. Disaster recovery.
  5. Testing and validation.

Transaction Failure Recovery

Recovery from transaction failures typically involves identifying and addressing the cause of the failure and taking appropriate actions to ensure data consistency and system integrity. Here are some common steps involved in recovering from transaction failures in ADMS:

  1. Error Detection.
  2. Transaction Rollback.
  3. Error Handling and Logging.
  4. Error Resolution and Retry.
  5. Redundancy and Failover.
  6. Monitoring and Alerting.
  7. Data Integrity Checks.

Log-Based Recovery

A journal is a series of entries. The log of each transaction is stored in some kind of stable storage, so that you can restore from there in case of failure.

When an operation is performed on the database, it is logged.

However, the log saving process must complete before any actual transactions are applied to the database.

Checkpoint

A checkpoint is a type of mechanism by which all old logs are removed from the system and stored permanently on a storage drive.

Checkpoints are like bookmarks. During transaction execution, these checkpoints are marked and when the transaction is executed, a log file is created with the steps of the transaction.

When a checkpoint is reached, the transaction is updated in the database and the entire log file up to that point is removed from the file. The log file is then updated with the new transaction step until the next checkpoint.

A checkpoint is used to declare a point where the DBMS is in a consistent state and all transactions are committed.

Deadlock

A deadlock is a condition in a database system where two or more transactions exist, with each transaction waiting for a data element locked by another transaction. A deadlock can be represented as a loop in a wait graph. This is a directed graph where vertices represent transactions and edges represent waits for data items.

For example, in the following wait graph, transaction T1 is waiting for item X locked by T3. T3 is waiting for Y, which is blocked by T2, and T2 is waiting for Z, which is blocked by T1. This creates a wait loop and no transactions can continue running.

Concurrency Control Techniques

Concurrency Control

Concurrency control is the control procedure required to control the concurrency of operations running on a database.

However, before learning about concurrency control, you must learn about parallel execution.

Locking Techniques for Concurrency

Concurrency control is a crucial aspect of managing concurrent access to shared resources in a multi-user or multi-threaded environment. Several locking techniques are used to ensure data consistency and prevent conflicts among concurrent transactions or threads. Here are some commonly employed locking techniques for concurrency control:

  1. Binary Locks.
  2. Read/Write Locks.
  3. Multiple Granularity Locking.
  4. Two-Phase Locking (2PL).
  5. Optimistic Concurrency Control (OCC).
  6. Deadlock Detection and Avoidance.

Timestamp Concurrency Control Protocols

Concurrency control is an essential aspect of database management systems (DBMS) that ensures the integrity and consistency of data when multiple transactions are executed concurrently. Time-stamping protocols are one approach to achieve concurrency control. They use timestamps assigned to transactions to determine their relative order and resolve conflicts. Here are three popular time-stamping protocols:

  1. Timestamp Ordering Protocol.
  2. Thomas’ Write Rule Protocol.
  3. Multiversion Timestamp Ordering Protocol.

Validation Based Protocol

DBMS validation protocols, also known as optimistic concurrency control techniques, are techniques for preventing concurrency in transactions. This protocol reduces the hassle of transactions by updating local copies of transaction data rather than the data itself.

Validation-based protocols are performed in three steps:

  • Read Phase
  • Validation Phase
  • Write Phase

Multiple granularity

The term “multiple granularity” is not a common phrase or concept specific to ADBS. However, I can provide an explanation of granularity and how it may relate to databases in general.

Granularity refers to the level of detail or the size of the units of data or operations within a system. In the context of databases, it typically refers to the level at which data is stored, managed, or accessed. Different levels of granularity can exist within a database system, depending on the needs and requirements of the application.

For example, in a database, you might have different levels of granularity for data storage, retrieval, and processing. At a high level, you might have a database that stores large sets of data, such as customer information. At a lower level, you might have tables within the database that store more specific information about individual customers.

Multi-Version Schemes

A multiversion protocol minimizes read latency and supports multiple versions of data items. For each write operation performed, it creates a new version of the transaction’s data, so whenever a transaction performs a read operation to read that data, the control manager selects that version of the data that was created so that the read operation succeeds without conflicts. .

When a write operation is created and a new version of the data is created, this new version contains some of the information below.

  • Content
  • Write_timestamp
  • Read_timestamp

Recovery with Concurrent Transaction

Logs are interleaved whenever more than one transaction is in progress. During restore, it is difficult for the restore system to keep track of all the logs and then start the restore.

To mitigate this situation, most DBMSs use the concept of “checkpoints”.

We discussed breakpoints in Transaction Processing Concepts in this guide, so you can revisit the concepts for clarification.

DDBMS Design

Distributed Database Management System (DDBMS) is a software system that manages a database that is spread across multiple computers or nodes in a network. It provides transparent access and efficient management of data in a distributed environment. Here are some key concepts and design considerations for distributed DBMS (DDBMS) in Advanced Database Management Systems (ADBMS):

Concepts of Distributed DBMS

  1. Data Distribution.
  2. Transparency.
  3. Data Replication.
  4. Distributed Query Processing.
  5. Concurrency Control.
  6. Distributed Transaction Management.


Distributed Database Management System (DBMS) refers to a system that manages a database spread across multiple computers or nodes, interconnected through a network. It allows users to access and manipulate data stored in different locations as if it were a single, unified database. The primary goal of a distributed DBMS is to provide transparency and efficiency in data storage, access, and processing in a distributed environment.

Concepts of Distributed DBMS:

  1. Data Distribution: In a distributed DBMS, data is partitioned and distributed across multiple nodes. Various techniques like horizontal partitioning, vertical partitioning, and hybrid partitioning can be used to divide the data.
  2. Transparency: One of the key concepts in distributed DBMS is transparency, which aims to hide the distribution of data and the complexity of the system from users and applications. Transparency ensures that users perceive the distributed database as a single logical entity.a. Location Transparency: Users are unaware of the physical location of data in a distributed environment.b. Fragmentation Transparency: Users are unaware of how data is divided or fragmented across multiple nodes.c. Replication Transparency: Users are unaware of data replication, i.e., multiple copies of data stored at different locations.d. Transaction Transparency: Users are unaware of the fact that a transaction may involve multiple nodes and the complexities involved in maintaining data consistency.
  3. Data Replication: Replication involves creating and maintaining multiple copies of data on different nodes in the distributed system. It improves data availability, fault tolerance, and performance. However, it also introduces challenges in maintaining data consistency.
  4. Distributed Query Processing: When a query is executed in a distributed DBMS, it needs to be processed and coordinated across multiple nodes that hold the relevant data. The system must optimize query execution to minimize network overhead and maximize performance.
  5. Concurrency Control: Concurrency control ensures that multiple transactions executing concurrently in a distributed DBMS do not interfere with each other and maintain data consistency. Techniques like locking, timestamp ordering, and optimistic concurrency control are used to manage concurrent access to data.
  6. Distributed Transaction Management: A distributed DBMS must ensure that distributed transactions spanning multiple nodes are executed reliably and maintain the ACID (Atomicity, Consistency, Isolation, Durability) properties. It involves protocols for distributed transaction processing, distributed deadlock detection, and recovery mechanisms.

Design Considerations for Distributed DBMS:

  1. Data Fragmentation and Allocation.
  2. Data Replication.
  3. Communication and Network.
  4. Data Consistency and Integrity.
  5. Security and Privacy.
  6. Fault Tolerance and Recovery.
  7. Distributed Query Optimization.
  8. Scalability.

Function of a DDBMS

We expect DDBMS to have at least the features of a centralized DBMS. In addition, we expect DDBMS to have the following features:

  • An advanced communications service that provides access to remote sites and allows the transfer of requests and data between sites over a network.
  • An extended system catalog for storing data distribution information.
  • Distributed query processing, including query optimization and remote data access.
  • Advanced security controls to maintain proper authentication/access to decentralized data.
  • Advanced concurrency control to ensure replicated data consistency.
  • Advanced recovery services to account for single site failures and link failures.

Architecture for a DDBMS

The ANSI-SPARC 3-Tier Architecture for DBMS provides a reference architecture for centralized DBMS. The diversity of distributed DBMSs makes it even more difficult to imagine an equivalent and generally applicable architecture. However, it may be useful to present one possible reference architecture dealing with data distribution. The reference architecture shown in the figure consists of the following circuit.

  • A set of global external schemas;
  • Global conceptual framework;
  • Fragmentation and distribution schemes;

Transaction Processing in a Distributed System

A transaction is a logical unit of work consisting of one or more SQL statements executed by a single user. A transaction starts with a user’s first executable SQL statement and ends when that user commits or rolls back.

A remote transaction contains only statements referencing a single remote node. A distributed transaction contains statements that access more than one node.

Data fragmentation

Distributed database systems provide transparency in the distribution of data across databases. This is achieved through a concept called data fragmentation. This means data fragmentation across networks and databases. Initially, all databases and data are developed by applying normalization and denormalization according to the standards of all database systems. However, the concept of a distributed system requires further separation of this normalized data. In other words, the main goal of a DDBMS is to provide data from the nearest location to the user as quickly as possible. Therefore, the data in the table is separated according to location or user requirements.

Replication and allocation techniques for distributed system

Replication and allocation ways are used to ameliorate system performance, fault forbearance, and scalability. Then are some common ways employed in distributed systems

Replication

  • Full Replication.
  • Partial Replication.
  • Lazy Replication.

Data Allocation

  • Centralized Allocation.
  • Decentralized Allocation.
  • Hash-based Allocation.
  • Range-based Allocation.
  • Consistent Hashing.

Overview of Concurrency Control and Recovery in Distributed Databases

For concurrency management and recovery purposes, distributed DBMS environments present many challenges that centralized DBMS environments do not. These include:

Work with multiple copies of data items. Concurrency control methods are responsible for maintaining consistency between these copies. The restore method serves to ensure that the copy is consistent with the other copies in case the site where the copy is stored goes down and is later restored.

individual site failures. A DDBMS should, if possible, continue to function on a live site when one or more individual sites go down. When a site is restored, its local database must be updated to match the rest of the site before rejoining the system.

Communication channel failure. The system must be able to cope with the failure of one or more communication links connecting the sites. An extreme case of this problem could be network partitioning. This splits your site into two or more sections, and the sites in each section can only interact with each other and not with sites in other sections.

distributed commit. If some sites fail during the commit process, problems can occur when committing transactions that access databases stored in multiple sites. To solve this problem, the two-phase commit protocol (see Section 23.6) is often used.

Distributed Deadlock. Deadlocks can occur between multiple sites, so deadlock techniques must be extended to account for this.

Distributed concurrency control and recovery techniques must address these and other issues. In the following subsections, we will look at some of the proposed techniques for handling recovery and concurrency control in DDBMS.

Introduction to OODBMS

An Object- acquainted Database Management System( OODBMS) is a database operation system that’s specifically designed to work with object- acquainted programming languages and models. It combines the principles of object- acquainted programming with database operation, furnishing a flawless integration between the programming language and the database.

Traditional relational databases are grounded on the relational model, which organizes data into tables with rows and columns. In discrepancy, an OODBMS stores data as objects, which are cases of classes in an object- acquainted programming language. These objects can contain attributes( data) and styles( functions or procedures).

OODBMSs give several advantages over traditional relational databases in scripts where complex data structures, connections, and actions need to be represented. Then are some crucial features and benefits of OODBMS