Computer Architecture Assignment 3 (ARCH)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Assignment 3

Submitted To: Morris Lancaster


Submitted By: Tejasvi Sharma 4/18/2013

CSci6461 Lancaster

Homework Set 3

Spring 2013

1. If is the fraction of a program ps code that can be executed simultaneously by n processors in a computer system and the remaining code must be executed sequentially by a single processor, along with the fact that each processor has an execution rate of x million instructions per second. Then a. Derive an expression for the effective MIPS rate when using this system for the execution of this program in terms of x, n and . b. If n =16 and x = 4 MIPS, determine the value of that will yield a system performance of 34 MIPS. A. Expression of the final speed up of a system.

Solution

Speedup = 1/ (Fraction enhanced/Speedup enhanced) + (1-Fraction enhanced) Program P can be run in parallel with n processors. So, the fraction of program ( ) can be run n times faster and remaining at regular speed. So, Final Speedup = (Final Speed with part of parallel)/(Original speed without parallel) = 1/( /n+(1- )),where is the fraction of parallel, n is the speedup of parallel part. Original speed without parallel is x MIPS, fraction of parallel part is , the speedup of parallel part is n (n parallel processors means n times faster). Therefore, expression for the effective MIPS rate is x/( /n + (1- ))

B. Given MIPS rate = 34, n = 16 and x = 4 MIPS Substitute in the given above formula, 34 = ( /16+(1- )) 4

2.125 + 34 -34 = 4 = 1.0625

CSci6461 Lancaster

Homework Set 3

Spring 2013

2.

Directory protocols are more scalable than snooping protocols because they send explicit request and invalidate messages to those nodes that have copies of a block, while snooping protocols broadcast all requests and invalidates to all nodes. Consider the 16-processor system illustrated in Figure 4.42 and assume that all caches not shown have invalid blocks. For each of the sequences below, identify which nodes receive each request and invalidate. a. [10] <4.4> P1: write 120 <-- 80 b. [10] <4.4> P1: write 110 <-- 88 c. [10] <4.4> P15: write 118 <-- 90 d. [10] <4.4> P15: write 108 <-- 98 Solution
a. P1: write 120<--80 Send invalidate to P15 b. P1: write 110<--88 Send fetch/invalidate to P0 c. P15: write 118<--90 Send invalidate toP1 d. P15: write 108<--98 Send invalidate to P0

CSci6461 Lancaster

Homework Set 3

Spring 2013

3. If you were developing a multiprocessor using Uniform Memory Access and Symmetric Multiprocessor configurations, and you wanted to maximize the number of processors that could be configured in the system, which of the following would you use. Select all that apply and explain why you would use the feature. a. b. c. d. e. Write-through cache Write-back cache Fully Associative Cache Direct Mapped Cache Direct Mapped Cache with Victim Cache
Solution

In a SMP system there are multiple processors that share other all other system resources(Memory, disk, etc). In general, most all-modern high-end processors have a writethrough policy for the L1 cache, and a write-back policy for the lower level caches. There are several reasons for this. Since in this class of processors L2 caches are almost exclusively on-chip and generally quite fast the penalties from having L1 write-through are not the major consideration. Further, since L1 sizes are small, pools of written data unlikely to be read in the future could cause pollution of the limited L1 resource. Additionally, a write-through L1 does not have to be concerned if it has outstanding dirty data hence can pass the extra coherency logic to the L2 (which, as mentioned, already has a larger part to play in cache coherency).Fully Associative caches will allow a cache line to exist in any entry of the cache. This avoids the problem with aliasing, since any entry is available for use. But it is very expensive to implement in hardware because every possible location must be looked up simultaneously to determine if a value is in the cache. So, I shall develop the multiprocessors with write-through and write back policy. I can also use direct-mapped cache as despite of having worse miss-ratios; large directmapped caches often handle processor references faster than more-expensive set-associative caches.The victim cache is usually fully associative, and is intended to reduce the number of conflict misses. Many commonly used programs do not require an associative mapping for all the accesses. In fact, only a small fraction of the memory accesses of the program require high associativity. As small fraction of memory requires high associativity, so I will develop victim cache too.

CSci6461 Lancaster

Homework Set 3

Spring 2013

4. Using what we know about caches and the principals of special and temporal locality, optimize the following code. For all techniques that you use to optimize, tell what technique is being done and briefly how it was done. int x[1000]; double y[1000]; x[0]=1; x[1]=1;
for (i=2; i<1000; i++){ x[i] = x[i-1]+x[i-2]; }

//assume these two lines //do not translate into code

for (i=0; i<1000; i++){ if (i>0) { y[i] = double(x[i])/double(x[i-1]); } else { y[i] = .61803; } } Solution
Optimizedcode: x[0]=1; x[1]=1; y[0]= .61803; y[1]=1; for (i=2; i<1000; i++){ x[i] = x[i-1]+x[i-2]; y[i] = double(x[i])/double(x[i-1]); }

The example shows that, when executing second loop for first time, it jumps to else part of the program. Hence declaring the value of y[0] at top helps in removing else part from program. As value of y is dependent on value of x, x[1] is calculated before computing value of y[1]. So it can be declared at top. The main part of second loop can be fused with first loop. This optimization helps in reducing misses and improving spatial locality.

CSci6461 Lancaster

Homework Set 3

Spring 2013

5. Research the MESI cache protocol. Draw a diagram with the 4 states and describe each. Tell how each state compares with the basic protocol covered in the lecture, that is, associate the Invalid state of each and note how they are the same. Do the same comparison for the other states. Now also explain why the extra state was added, that is what is the benefit of the additional state. MESI is used as cache coherency and memory coherence protocol, which was later introduced by Intel in the Pentium processor to "support the more efficient write-back cache in addition to the write-through cache previously used by the Intel 486 processor" is called the MESI protocol (known also as Illinois protocol). Every cache line is marked with one of the four following states (coded in two additional bits): M - Modified: The cache line is present only in the current cache, and is dirty; it has been modified from the value in main memory. The cache is required to write the data back to main memory at some time in the future, before permitting any other read of the (no longer valid) main memory state. E - Exclusive: The cache line is present only in the current cache, but is clean; it matches main memory. S - Shared: Indicates that this cache line may be stored in other caches of the machine. I Invalid: Indicates that this cache line is invalid. A cache may satisfy a read from any state except Invalid. An Invalid line must be fetched (to the Shared or Exclusive states) to satisfy a read. A write may only be performed if the cache line is in the Modified or Exclusive state. If it is in the Shared state, all other cached copies must be invalidated first. This is typically done by a broadcast operation. A cache may discard a non-Modified line at any time, changing to the Invalid state. A Modified line must be written back first. A cache that holds a line in the Modified state must snoop (intercept) all attempted reads (from all of the other CPUs in the system) of the corresponding main memory location and insert the data that it holds. This is typically done by forcing the read to back off (i.e. to abort the memory bus transaction), then writing the data to main memory and changing the cache line to the Shared state. A cache that holds a line in the Shared state must also snoop all invalidate broadcasts from other CPUs, and discard the line (by moving it into Invalid state) on a match. A cache that holds a line in the Exclusive state must also snoop all read transactions from all other CPUs, and move the line to Shared state on a match. The precise ones are Modified and Exclusive states: i.e. they match the true cache line ownership situation in the system. The Shared state may be imprecise: if another CPU discards a Shared line were the CPU also becomes the sole owner of that cache line, the line will not be promoted to Exclusive state (because broadcasting all cache line replacements from all CPUs is not practical over a broadcast snoop bus). In that sense the Exclusive state is an opportunistic optimization: If the CPU wants to modify a cache line that is in state S, then a bus transaction is necessary to invalidate all other cached

CSci6461 Lancaster

Homework Set 3

Spring 2013

copies. State E enables modifying a cache line with no bus transaction. The base MSI coherence protocol will first fetch all of the cache blocks in Shared state, and then be forced to perform an invalidate operation to upgrade them to the Modified state. The benefit of adding "Exclusive" state to basic MSI protocol is that it helps in reducing the traffic caused by writes of blocks that only exist in one cache.

7.For the same operations in a directory cache scheme, complete the matrix below.. Again the space in the matrix is no indication of the required space.

CSci6461 Lancaster

Homework Set 3

Spring 2013

6.For a snooping cache implementation, complete the following matrix for the operations given. The spaces in the spreadsheet between operations are not necessarily an indication of the content that must be provided.

Group Study With Alroy Fernandez,Anuj Mehta,Amita Shivangi

CSci6461 Lancaster

Homework Set 3

Spring 2013

You might also like