21csc205p Dbms Unit 5
21csc205p Dbms Unit 5
21csc205p Dbms Unit 5
Management Systems
UNIT-V
TOPICS
• Storage Structure
• Transaction control
• Concurrency control algorithms and Graph
• Issues in Concurrent execution
• Failures and Recovery algorithms
• Case Study: Demonstration of Entire project by applying all the
concepts learned with minimum Front-End requirements, NoSQL
Database, Document Oriented, Key Value pairs, Column Oriented
Storage Structure
• Overview of Physical Storage Media
• Magnetic Disks
• RAID
• Tertiary Storage
• Storage Access
• File Organization
• Organization of Records in Files
• Data-Dictionary Storage
Physical Storage Media
Classification of Physical Storage Media
1. Cache: fastest and most costly form of storage; volatile; managed by the
computer system hardware.
2. Main memory:
• fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9seconds)
• generally too small (or too expensive) to store the entire database
• capacities of up to a few Gigabytes widely used currently
• Capacities have gone up and per-byte costs have decreased steadily and
rapidly (roughly factor of 2 every 2 to 3 years)
• Volatile: contents of main memory are usually lost if a power failure or system
crash occurs.
3. Flash Memory
• Non-volatile, used primarily for backup (to recover from disk failure), and for
archival data
• Sequential-access – much slower than disk
• Very high capacity (40 to 300 GB tapes available)
• Tape can be removed from drive storage costs much cheaper than disk, but
drives are expensive
• Tape jukeboxes available for storing massive amounts of data
• hundreds of terabytes (1 terabyte = 109 bytes) to even multiple petabytes
(1 petabyte = 1012 bytes)
Storage Hierarchy (Cont.)
• Read-write head
• Positioned very close to the platter surface
• Reads or writes magnetically encoded information.
• Surface of platter divided into circular tracks
• Over 50K-100K tracks per platter on typical hard disks
• Each track is divided into sectors.
• A sector is the smallest unit of data that can be read or written.
• Sector size typically 512 bytes
• Typical sectors per track: 500 to 1000 (on inner tracks) to 1000 to 2000 (on outer tracks)
• To read/write a sector
• disk arm swings to position head on right track
• platter spins continually; data is read/written as sector passes under head
• Head-disk assemblies
• multiple disk platters on a single spindle (1 to 5 usually)
• one head per platter, mounted on a common arm.
• Cylinder i consists of ith track of all the platters
Magnetic Disks (Cont.)
• Earlier generation disks were susceptible to head-crashes
• Surface of earlier generation disks had metal-oxide coatings which would disintegrate
on head crash and damage all data on disk
• Current generation disks are less susceptible to such disastrous failures, although
individual sectors may get corrupted
• Disk controller – interfaces between the computer system and the disk drive hardware.
• accepts high-level commands to read or write a sector
• initiates actions such as moving the disk arm to the right track and actually reading or
writing the data
• Computes and attaches checksums to each sector to verify that data is read back
correctly
• If data is corrupted, with very high probability stored checksum won’t match
recomputed checksum
• Ensures successful writing by reading back sector after writing it
• Performs remapping of bad sectors
Disk Subsystem
• Multiple disks connected to a computer system through a controller
• Controllers functionality (checksum, bad sector remapping) often carried out by individual disks; reduces
load on controller
• Disk interface standards families
• ATA (AT adaptor) range of standards
• SATA (Serial ATA)
• SCSI (Small Computer System Interconnect) range of standards
• SAS (Serial Attached SCSI)
• Several variants of each standard (different speeds and capabilities)
Disk Subsystem (cont.)
RAID levels
• RAID 0
• RAID 1
• RAID 2
• RAID 3
• RAID 4
• RAID 5
• RAID 6
RAID 0
- RAID 0 consists of striping, but no mirroring or parity, but no redundancy of data. It
offers the best performance, but no fault tolerance.
- In this level, a striped array of disks is implemented. The data is broken down into blocks and the
blocks are distributed among disks.
- Block “1, 2” forms a stripe.
- Each disk receives a block of data to write/read in parallel.
- Reliability: there is no duplication of data. Hence, a block once lost cannot be recovered.
RAID 1
• RAID 1 is also known as disk mirroring, this configuration consists of at least two drives that
duplicate the storage of data.
• There is no striping. When data is sent to a RAID controller, it sends a copy of data to all the disks
in the array.
• Read performance is improved since either disk can be read at the same time.
• Write performance is the same as for single disk storage. (This level performs mirroring of data in
drive 1 to drive 2. It offers 100% redundancy as array will continue to work even if either disk
fails.)
RAID 2
• RAID2 uses striping across disks, with some disks storing error checking
and correcting (ECC) information.
• This level uses bit-level data stripping rather than block level.
• It uses an extra disk for storing all the parity information.
RAID 3
• This technique uses striping and dedicates one drive to storing parity information. So
RAID 3 stripes the data onto multiple disks.
• The parity bit generated for data word is stored on a different disk. This technique makes
it to overcome single disk failures.
• The ECC information is used to detect errors.
• This level uses byte level stripping along with parity.
• One dedicated drive is used to store the parity information and in case of any drive failure
the parity is restored using this extra drive.
• But in case the parity drive crashes then the redundancy gets affected again so not much
considered in organizations.
RAID 4
• In this level, an entire block of data is written onto data disks and then the parity is generated and
stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses block-
level striping. Both level 3 and level 4 require at least three disks to implement RAID.
• This level uses large stripes, which means you can read records from any single drive.
• This level is very much similar to RAID 3 apart from the feature where RAID 4 uses block level
stripping rather than byte level.
RAID 5
• This level is based on block level striping with distributed parity.
• The parity information is striped across each drive.
• RAID 5 requires at least three disks, but it is often recommended to use at least five disks for
performance reasons.
• Parity information is written to a different disk in the array for each stripe.
• In case of single disk failure data can be recovered with the help of distributed parity.
• The parity bit rotates among the drives to make the random write performance better.
RAID 6 (P+Q Redundancy Scheme)
• RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional
fault tolerance. This level requires at least four disk drives to implement RAID.
• The use of additional parity allows the array to continue to function even if two disks fail
simultaneously. However, this extra protection have a higher cost per gigabyte(GB).
• This level is an enhanced version of RAID 5 adding extra benefit of dual parity (2 parity
blocks are created.)
• This level uses block level stripping with DUAL distributed parity and can survive
concurrent 2 drive failures in an array which leads to extra fault tolerance and
redundancy.
Tertiary Storage
• In a large database system, some of the data may have to reside on tertiary storage.
• The two most common tertiary storage media are optical disks and magnetic tapes.
1. Optical Disks
2. Magnetic Tapes
1. Optical Disks
• Compact disk-read only memory (CD-ROM)
• Removable disks, 640 MB per disk
• Seek time about 100 msec (optical read head is heavier and slower)
• Higher latency (3000 RPM) and lower data-transfer rates (3-6 MB/s) compared to magnetic
disks
• Digital Video Disk (DVD)
• DVD-5 holds 4.7 GB , and DVD-9 holds 8.5 GB
• DVD-10 and DVD-18 are double sided formats with capacities of 9.4 GB and 17 GB
• Blu-ray DVD: 27 GB (54 GB for double sided disk)
• Slow seek time, for same reasons as CD-ROM
• Record once versions (CD-R and DVD-R) are popular
• data can only be written once, and cannot be erased.
• high capacity and long lifetime; used for archival storage
• Multi-write versions (CD-RW, DVD-RW, DVD+RW and DVD-RAM) also available
2. Magnetic Tapes
• Simple approach:
• Store record i starting from byte n (i – 1), where n is the size of each record.
• Record access is simple but records may cross blocks
• Modification: do not allow records to cross block boundaries
• Deletion of record i:
alternatives:
• move records i + 1, . . ., n
to i, . . . , n – 1
• move record n to i
• do not move records, but
link all free records on a
free list.
Deleting Record 3 and Compacting Deleting Record 3 and Moving Last
Record
Free Lists
• Store the address of the first deleted record in the file header.
• Use this first record to store the address of the second deleted record, and so on
• Can think of these stored addresses as pointers since they “point” to the location of a record.
• More space efficient representation: reuse space for normal attributes of free records to store
pointers. (No pointers stored in in-use records.)
2. Variable-Length Records
• Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields such as strings (varchar)
• Record types that allow repeating fields (used in some older data models).
• Attributes are stored in order
• Variable length attributes represented by fixed size (offset, length), with actual data stored after all
fixed length attributes
• Null values represented by null-value bitmap
Variable-Length Records: Slotted Page Structure
• Heap – a record can be placed anywhere in the file where there is space
• Sequential – store records in sequential order, based on the value of the search
key of each record
• Hashing – a hash function computed on some attribute of each record; the result
specifies in which block of the file the record should be placed
• Records of each relation may be stored in a separate file. In a multitable
clustering file organization records of several different relations can be stored in
the same file
• store related records on the same block to minimize I/O
Sequential File Organization
• Suitable for applications that require sequential processing of the entire file.
• The records in the file are ordered by a search-key
Sequential File Organization (Cont.)
• Deletion – use pointer chains
• Insertion –locate the position where the record is to be inserted
• if there is free space insert there
• if no free space, insert the record in an overflow block
• In either case, pointer chain must be updated
• Need to reorganize the file
from time to time to restore
sequential order.
Multitable Clustering File Organization
• Store several relations in one file using a multitable clustering file organization
Department
Instructor
Multitable clustering
of department and
instructor
Data Dictionary Storage
• The Data dictionary (also called system catalog) stores metadata; that is, data about
data, such as
• Information about relations
• names of relations
• names, types and lengths of attributes of each relation
• names and definitions of views
• integrity constraints
• User and accounting information, including passwords
• Statistical and descriptive data
• number of tuples in each relation
• Physical file organization information
• How relation is stored (sequential/hash/…)
• Physical location of relation
• Information about indices
Relational Representation of System Metadata
• Relational representation on disk
• Specialized data structures designed for efficient access, in memory
Storage Access
• A database file is partitioned into fixed-length storage units called blocks. Blocks
are units of both storage allocation and data transfer.
• Database system seeks to minimize the number of block transfers between the disk
and memory. We can reduce the number of disk accesses by keeping as many
blocks as possible in main memory.
• Buffer – portion of main memory available to store copies of disk blocks.
• Buffer manager – subsystem responsible for allocating buffer space in main
memory.
Buffer Manager
• Programs call on the buffer manager when they need a block from disk.
1. If the block is already in the buffer, buffer manager returns the address of the
block in main memory
2. If the block is not in the buffer, the buffer manager
1. Allocates space in the buffer for the block
Replacing (throwing out) some other block, if required, to make space
for the new block.
Replaced block written back to disk only if it was modified since the
most recent time that it was written to/fetched from the disk.
2. Reads the block from the disk to the buffer, and returns the address of the
block in main memory to requester.
Transaction control
Database System Concepts - 7th Edition 17.1 ©Silberschatz, Korth and Sudarshan
Transaction control
Transaction Concept
Transaction State
Concurrent Executions
Serializability
Testing for conflict and View Serializability.
Recoverability
Cascading rollback
Cascade less
Transaction Concept
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
Isolation can be ensured trivially by running transactions serially
• That is, one after the other.
However, executing multiple transactions concurrently has significant
benefits, as we will see later.
ACID Properties
A transaction is a unit of program execution that accesses and possibly
updates various data items. To preserve the integrity of data the database
system must ensure:
Atomicity. Either all operations of the transaction are properly reflected in
the database or none are.
Consistency. Execution of a transaction in isolation preserves the
consistency of the database.
Isolation. Although multiple transactions may execute concurrently, each
transaction must be unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from other concurrently
executed transactions.
• That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started execution
after Ti finished.
Durability. After a transaction completes successfully, the changes it has
made to the database persist, even if there are system failures.
Transaction State
Active – the initial state; the transaction stays in this state while it is
executing
Partially committed – after the final statement has been executed.
Failed -- after the discovery that normal execution can no longer proceed.
Aborted – after the transaction has been rolled back and the database
restored to its state prior to the start of the transaction. Two options after it
has been aborted:
• Restart the transaction
Can be done only if no internal logical error
• Kill the transaction
Committed – after successful completion.
Transaction State (Cont.)
Concurrent Executions
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from
A to B.
A serial schedule in which T1 is followed by T2 :
Schedule 2
Database System Concepts - 7th Edition 17.14 ©Silberschatz, Korth and Sudarshan
Schedule 4
Database System Concepts - 7th Edition 17.15 ©Silberschatz, Korth and Sudarshan
Serializability
Schedule 3 Schedule 6
Conflict Serializability (Cont.)
Let S and S’ be two schedules with the same set of transactions. S and S’
are view equivalent if the following three conditions are met, for each data
item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in
schedule S’ also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was
produced by transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that was produced by the
same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule S’.
As can be seen, view equivalence is also based purely on reads and writes
alone.
View Serializability (Cont.)
If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence, database must ensure that schedules
are recoverable.
Cascading Rollbacks
There are a variety of concurrency-control schemes. No one scheme is clearly the best; each
one has advantages. Some of the protocols used are :
Shared-exclusive
protocol
• Shared Locks (S) : If transaction locked data item in shared mode then
allowed to read only.
• Exclusive locks (X) : If transaction locked data item in exclusive mode then
allowed to read and write both.
Reques
If transaction Ti can be granted a lock on Q immediately, in spite of t
the presence of the mode B lock, then we say mode A is compatible
with mode B. Such a function can be represented conveniently by a
matrix.
An element comp(A, B) of the matrix has the value true if and only if
mode A is compatible with mode B
DRAWBACKS OF SHARED EXCLUSIVE
LOCKING
• The protocol may not be sufficient to produce
serializable schedule only, which means it cannot
provide consistent data at times.
• If we do not use locking, or if we unlock data items
too soon after reading or writing them, we may get
inconsistent states.
• If we do not unlock an item before requesting a lock
on another item, deadlocks may occur.
• In case of sequence of request for shared locks,
each transaction releases the lock a short while after
it is granted, but T1 never gets the exclusive-mode
lock. Hence , the transaction is starved.
2 PHASED LOCKING(2PL)
One protocol that ensures serializability is the two-phase locking protocol. This protocol
requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase : A transaction may obtain locks, but may not release any lock.
2. Shrinking phase : A transaction may release locks, but may not obtain any new
locks.
Modifications of the 2PL protocol include :-
We can use a deadlock prevention protocol to ensure that the system will never enter a
deadlock state.
Alternatively, we can allow the system to enter a deadlock state, and then try to
recover by using a deadlock detection and deadlock recovery scheme.
To do so, the system must:
• Maintain information about the current allocation of data items to transactions, as
well as any outstanding data item requests.
• Provide an algorithm that uses this information to determine whether thesystem
has entered a deadlock state.
• Recover from the deadlock when the detection algorithm determines that a
deadlock exists.
Deadlocks can be described precisely in terms of a directed graph called a wait for
graph.
MULTIPLE
GRANULARITY
Granularity is the size of the data item allowed to lock.
Multiple Granularity means hierarchically breaking up the database into blocks that
can be locked and can be tracked what needs to lock and in what fashion.
TIMESTAMP-BASED PROTOCOLS
Another method for determining the serializability order is to select an ordering among
transactions in advance. The most common method for doing so is to use a timestamp-
ordering scheme.
TIMESTAMP
There S
are two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is
equal to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is,
a transaction’s timestamp is equal to the value of the counter when the transaction enters
the system.
To implement this scheme, we associate with each data item Q two timestamp values:
• W-timestamp(Q)- largest timestamp of any transaction that executed write(Q) successfully.
• R-timestamp(Q)- largest timestamp of any transaction that executed read(Q) successfully.
The timestamp-ordering protocol ensures that any conflicting read and write operations are executed in
timestamp order. This protocol operates as follows:
• If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten. Hence,
the read operation is rejected, and Ti is rolled back.
• If TS(Ti) ≥ W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to the
maximum of R-timestamp(Q) and TS(Ti).
• If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the
system assumed that that value would never be produced. Hence, the system rejects the write
operation and rolls Ti back.
• If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the system
rejects this write operation and rolls Ti back.
• Otherwise, the system executes the write operation and sets W-timestamp(Q) to TS(Ti).
If a transaction Ti is rolled back by the concurrency-control scheme as result of issuance of either a read
or write operation, the system assigns it a new timestamp and restarts it.
Thomas’ Write
Rule to the timestamp-ordering protocol, called Thomas’ write
The modification
rule, is this: Suppose that transaction Ti issues write(Q).
1.If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was previously
needed, and it had been assumed that the value would never be produced. Hence,
the system rejects the write operation and rolls Ti back.
2.If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
Hence, this write operation can be ignored.
3.Otherwise, the system executes thewrite operation and setsW-timestamp(Q) to
TS(Ti)
The difference between these rules and those of timestamp ordering lies in the second
rule. The timestamp-ordering protocol requires that Ti be rolled back if Ti issues
write(Q) and TS(Ti) < W-timestamp(Q). However, here, in those cases where TS(Ti) ≥ R-
timestamp(Q), we ignore the obsolete write.
TIMESTAMP ORDERING SCHEME
DISADVANTAGES:
ADVANTAGES:
The validation scheme automatically guards against cascading rollbacks, since the
actual writes take place only after the transaction issuing the write has committed.
It has the desirable property that a read request never fails and is never made to wait.
It helps prevent deadlocks by allowing transactions to read and write different versions
of data items. This flexibility minimizes the chances of circular dependencies leading to
deadlocks.
DISADVANTAGES:
The reading of a data item also requires the updating of the R-timestamp field,
resulting in two potential disk accesses, rather than one.
The conflicts between transactions are resolved through rollbacks, rather than through
waits. This alternative may be expensive.
Multiversion Two-Phase Locking
The multiversion two-phase locking protocol attempts to combine the advantages of
multiversion concurrency control with the advantages of two-phase locking. This protocol
differentiates between read-only transactions and update transactions.
• Update transactions perform rigorous two-phase locking; that is, they hold all locks up
to the end of the transaction. Thus, they can be serialized according to their commit
order.
• Each version of a data item has a single timestamp.The timestamp in this case is not a
real clock-based timestamp, but rather is a counter, which we will call the ts-counter,
that is incremented during commit processing.
• The database system assigns read-only transactions a timestamp by reading the
current value of ts-counter before they start execution; they follow the multiversion
timestamp-ordering protocol for performing reads.
• Thus, when a read-only transaction Ti issues a read(Q), the value returned is the
contents of the version whose timestamp is the largest timestamp less than or equal
to TS(Ti)
ADVANTAGES:
Under first committer wins, when a transaction T enters the partially committed state, the
following actions are taken in an atomic action:
• A test is made to see if any transaction that was concurrent with T has already written an
update to the database for some data item that T intends to write.
• If some such transaction is found, then T aborts.
• If no such transaction is found, then T commits and its updates are written to the
database.
• Consider a situation, where one transaction is applying the aggregate function on some records while
another transaction is updating these records.
• The aggregate function may calculate some values before the values have been updated and others after
they are updated.
x2
x1
y1
memory disk
Data Access (Cont.)
• Each transaction Ti has its private work-area in which local
copies of all data items accessed and updated by it are kept.
• Ti's local copy of a data item X is called xi.
• Transferring data items between system buffer blocks and its
private work-area done by:
• read(X) assigns the value of data item X to the local variable xi.
• write(X) assigns the value of local variable xi to data item {X} in the
buffer block.
• Note: output(BX) need not immediately follow write(X). System can
perform the output operation when it deems fit.
• Transactions
• Must perform read(X) before accessing X for the first time
(subsequent reads can be from local copy)
• write(X) can be executed at any time before the transaction commits
Recovery and Atomicity
shadow-copy
Log-Based Recovery
• A log is kept on stable storage.
• The log is a sequence of log records, and maintains a record of
update activities on the database.
• When transaction Ti starts, it registers itself by writing a
<Ti start>log record
• Before Ti executes write(X), a log record
<Ti, X, V1, V2>
is written, where V1 is the value of X before the write (the old
value), and V2 is the value to be written to X (the new value).
• When Ti finishes it last statement, the log record <Ti commit>
is written.
• Two approaches using logs
• Deferred database modification
• Immediate database modification
Immediate Database Modification
• The immediate-modification scheme allows updates of an
uncommitted transaction to be made to the buffer, or the disk
itself, before the transaction commits
• Update log record must be written before database item is
written
• We assume that the log record is output directly to stable storage
• (Will see later that how to postpone log record output to some
extent)
• Output of updated blocks to stable storage can take place at
any time before or after transaction commit
• Order in which blocks are output can be different from the
order in which they are written.
• The deferred-modification scheme performs updates to
buffer/disk only at the time of transaction commit
• Simplifies some aspects of recovery
• But has overhead of storing local copy
Transaction Commit
<T0 start>
<T0, A, 1000, 950>
<To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600 BC output before
BB , TB1Ccommits
<T1 commit>
BA
• Undo of a log record <Ti, X, V1, V2> writes the old value V1 to
X
• Redo of a log record <Ti, X, V1, V2> writes the new value V2 to
X
• Undo and Redo of Transactions
• undo(Ti) restores the value of all data items updated by Ti to their old
values, going backwards from the last log record for Ti
• each time a data item X is restored to its old value V a special log record <Ti , X,
V> is written out
• when undo of a transaction is complete, a log record
<Ti abort> is written out.
• redo(Ti) sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
• No logging is done in this case
Undo and Redo on Recovering from Failure
• When recovering after failure:
• Transaction Ti needs to be undone if the log
• contains the record <Ti start>,
• but does not contain either the record <Ti commit> or <Ti abort>.
• Transaction Ti needs to be redone if the log
• contains the records <Ti start>
• and contains the record <Ti commit> or <Ti abort>
• Note that If transaction Ti was undone earlier and the <Ti abort>
record written to the log, and then a failure occurs, on recovery
from failure Ti is redone
• such a redo redoes all the original actions including the steps that
restored old values
• Known as repeating history
• Seems wasteful, but simplifies recovery greatly
Immediate DB Modification Recovery Example
Below we show the log as it appears at three instances of time.
• Undo phase:
1. Scan log backwards from end
1. Whenever a log record <Ti, Xj, V1, V2> is found where Ti is in undo-list
perform same actions as for transaction rollback:
1. perform undo by writing V1 to Xj.
2. write a log record <Ti , Xj, V1>
2. Whenever a log record <Ti start> is found where Ti is in undo-list,
1. Write a log record <Ti abort>
2. Remove Ti from undo-list
3. Stop when undo-list is empty
i.e. <Ti start> has been found for every transaction in undo-list
……
<checkpoint L>
…..
<checkpoint L>
last_checkpoint
…..
Log
Failure with Loss of Nonvolatile Storage
• So far we assumed no loss of non-volatile storage
• Technique similar to checkpointing used to deal with loss of non-
volatile storage
• Periodically dump the entire content of the database to stable
storage
• No transaction may be active during the dump procedure; a
procedure similar to checkpointing must take place
• Output all log records currently residing in main memory onto
stable storage.
• Output all buffer blocks onto the disk.
• Copy the contents of the database to stable storage.
• Output a record <dump> to log on stable storage.
Recovering from Failure of Non-Volatile Storage
early.
• They cannot be undone by restoring old values (physical undo), since
once a lock is released, other transactions may have updated the B +-
tree.
• Instead, insertions (resp. deletions) are undone by executing a deletion
(resp. insertion) operation (known as logical undo).
• For such operations, undo log records should contain the undo
operation to be executed
• Such logging is called logical undo logging, in contrast to physical
undo logging
• Operations are called logical operations
• Other examples:
• delete of tuple, to undo insert of tuple
• allows early lock release on space allocation information
• subtract amount deposited, to undo deposit
• allows early lock release on bank balance
Physical Redo
• Each page contains a PageLSN which is the LSN of the last log
record whose effects are reflected on the page
• To update a page:
• X-latch the page, and write the log record
• Update the page
• Record the LSN of the log record in PageLSN
• Unlock page
• To flush page to disk, must first S-latch page
• Thus page state on disk is operation consistent
• Required to support physiological redo
• PageLSN is used during recovery to prevent repeated redo
• Thus ensuring idempotence
ARIES Data Structures: Log Record
• Each log record contains LSN of previous log record of the same
transaction
LSN TransID PrevLSN RedoInfo UndoInfo
1 2 3 4 4' 3'
2' 1'
ARIES Data Structures: DirtyPage Table
• DirtyPageTable
• List of pages in the buffer that have been updated
• Contains, for each such page
• PageLSN of the page
• RecLSN is an LSN such that log records before this LSN have already been
applied to the page version on disk
• Set to current end of log when a page is inserted into dirty page table (just before
being updated)
• Recorded in checkpoints, helps to minimize redo work
ARIES Data Structures
ARIES Data Structures: Checkpoint Log
Undo pass
ARIES Recovery: Analysis
Analysis pass
• Starts from last complete checkpoint log record
• Reads DirtyPageTable from log record
• Sets RedoLSN = min of RecLSNs of all pages in DirtyPageTable
• In case no pages are dirty, RedoLSN = checkpoint record’s LSN
• Sets undo-list = list of transactions in checkpoint log record
• Reads LSN of last log record for each transaction in undo-list from
checkpoint log record
• Scans forward from checkpoint
• .. Cont. on next page …
ARIES Recovery: Analysis (Cont.)
Analysis pass (cont.)
• Scans forward from checkpoint
• If any log record found for transaction not in undo-list, adds transaction
to undo-list
• Whenever an update log record is found
• If page is not in DirtyPageTable, it is added with RecLSN set to LSN of the update
log record
• If transaction end log record found, delete transaction from undo-list
• Keeps track of last log record for each transaction in undo-list
• May be needed for later undo
• At end of analysis pass:
• RedoLSN determines where to start redo pass
• RecLSN for each page in DirtyPageTable used to minimize redo work
• All transactions in undo-list need to be rolled back
ARIES Redo Pass
• Fine-grained locking:
• Index concurrency algorithms that permit tuple level locking on indices
can be used
• These require logical undo, rather than physical undo, as in earlier recovery
algorithm
• Recovery optimizations: For example:
• Dirty page table can be used to prefetch pages during redo
• Out of order redo is possible:
• redo can be postponed on a page being fetched from disk, and
performed when page is fetched.
• Meanwhile other log records can continue to be processed
Remote Backup Systems
Remote Backup Systems
Basic structure
o Functional requirements
o Entity Relation (ER) diagram and constraints
o Relational database schema
Implementation
o Creating tables
o Inserting data
Queries
o Basic queries
o PL/SQL function
o Trigger function
o Stored procedures
o Functions
o Transactions
1. Project Description
In this new modern era of online shopping no seller wants to be left behind and every seller
want to the shift from offline selling model to an online selling model for a rampant growth.
Therefore, as an software engineer our job is to ease the path of this transition for the seller.
Amongst many things that an online site requires the most important is a database system.
Hence in this project we are planning to design a database where small sellers can sell their
product online.
The Prime Objective of our database project is to design a robust E-commerce database by
performing operations such as,
Viewing orders
Placing orders
Updating database
Reviewing products
Maintaining data consistency across tables
2. Requirements
A Customer can see the account details and can update if required.
Customer can search the products according to the category.
Customer can add his wish list to the cart and can see the total amount.
Customer can update the cart whenever required.
Customer can choose the mode of payment.
Customer can keep track of the order by seeing order status.
Customer can review the products which have been purchased.
Seller can update the stock of a particular product whether it is available or not.
Seller can keep track of total sales of his products.
Seller can know the sales on a particular day or month or year.
Customer_CustomerId Simple
Name Composite
Email Simple
Customer Strong
DateOfBirth Simple
Phone Multivalued
Age Derived
OrderId Simple
ShippingDate Simple
Order OrderDate Simple Strong
OrderAmount Simple
Cart_CartID Simple
ReviewId(PK)
Simple
Description
Simple
Review Ratings Strong
Simple
Product_ProductId
Simple
Customer_CustomerID
CategoryID(PK) Simple
Category CategoryName Simple Strong
DESCRIPTION Simple
payment_id Simple
Payment Order_OrderId Simple Strong
PaymentMode Simple
ENTITIES ATTRIBUTES ATTRIBUTE TYPE Entity Type
Customer_CustomerId Simple
PaymentDate Simple
Customer Total
Stays At OneToOne
Address Partial
Customer Partial
Shops OneToOne
Cart Total
Customer Partial
Places OneToMany
Order Total
Customer Partial
Makes OneToMany
Payment Total
Customer Partial
Write OneToMany
Review Total
Seller Partial
Sells ManyToMany
Product Total
Category Partial
Categorizes OneToMany
Product Total
Cart Partial
Contains ManyToMany
Product Partial
Product Partial
Includes OneToMany
Orderltem Total
Order Partial
Includes OneToOne
Orderltem Total
Payment Total
For OneToOne
Order Total
6. ER Diagram
QUERIES ON THE ABOVE RELATIONAL SCHEMA
3. Using triggers to update the no.of products as soon as the payment is made.
4. Trigger to update the total amount of user everytime he adds something to payment table.
6. Processing an order
To process an order, one should check whether those items are in stock.
If items are in stock, they need to be reserved so that they go in hands of those who have
expressed them in wishlist/order.
Once ordered the available quantity must be reduced to reflect the correct value in the stock.
Any items not in stock cannot be sanctioned; this requires confirmation from the seller.
The customer needs to be informed as to which items are in stock (and can be shipped
immediately) and which are cancelled.
If unable to do so,ROLLBACK;
If adding tuples to order items fails ROLL BACK all tuples of products added for and the tuple in
order row
QUERY 1: Customers to find products with highest ratings for a given category.
QUERY 2: Customers to filter out the products according to their brand and price.
QUERY 3: If a customer want to know the total price for all product present in the cart.
QUERY 4: Customers to find the best seller of a particular product.
QUERY 5: List the orders which are to be delivered at a particular pincode.
QUERY 6: List the product whose sale is the highest on a particular day.
QUERY 7: List the category of product which has been sold the highest on a particular day.
QUERY 8: List the customers who bought products from a particular seller the most.
QUERY 9: List all the orders whose payment mode is not CoD and yet to be delivered.
QUERY 10: List all orders of customers whose total amount is greater than 5000.
QUERY 11: If customer wants to modify the cart that, is he want to delete some products from the
cart.
QUERY 12: List the seller who has the highest stock of a particular product.
QUERY 13: Customers to compare the products based on their ratings and reviews.
CASE STUDY ON NoSQL Databases-
Document Oriented, Key value pairs,
Column Oriented and Graph
NoSQL
NoSQL Database is a non-relational Data Management System, that does not require a fixed
schema.
NoSQL databases are non-tabular and handle data storage differently than relational tables.
These databases are classified according to the data model, and popular types include
document, graph, column, and key-value.
Non-relational in nature
The core function of NoSQL is to provide a mechanism for storing and retrieving information.
NoSQL
NoSQL database stands for “Not Only SQL” or “Not SQL.”
NoSQL is used for Big data and real-time web apps. For example, companies like Twitter,
Facebook
Why NoSQL
Internet giants like Google, Facebook, Amazon, etc. who deal with huge volumes of data. The
system response time becomes slow when you use RDBMS for massive volumes of data.
To resolve this problem, we could “scale up” our systems by upgrading our existing hardware.
But this process is expensive.
Why NoSQL
Alternative for this issue is to distribute database - load on multiple hosts whenever the load
increases.
This method is known as “scaling out.”
Features of NoSQL
Non-relational
•NoSQL databases never follow the relational model
•Never provide tables with flat fixed-column records
•Work with self-contained aggregates
•Doesn’t require object-relational mapping and data normalization
•No complex features like query languages, query planners,referential integrity joins, ACID
Features of NoSQL
Schema-free
•NoSQL databases are either schema-free or have relaxed schemas
•Do not require any sort of definition of the schema of the data
•Offers heterogeneous structures of data in the same domain
Features of NoSQL
Simple API
•Offers easy to use interfaces for storage and querying data provided
•APIs allow low-level data manipulation & selection methods
•Text-based protocols mostly used with HTTP REST with JSON
•Mostly used no standard based NoSQL query language
•Web-enabled databases running as internet-facing services
Features of NoSQL
Distributed
•Multiple NoSQL databases can be executed in a distributed fashion
•Offers auto-scaling and fail-over capabilities
•Often ACID concept can be sacrificed for scalability and throughput
•Mostly no synchronous replication between distributed nodes Asynchronous Multi-Master
Replication, peer-to-peer, HDFS Replication
•Only providing eventual consistency
•Shared Nothing Architecture. This enables less coordination and higher distribution.
Types of NoSQL Databases
NoSQL Databases are mainly categorized into four types: