Title : ARCHITECTURAL EFFECTS ON DSP ALGORITHMS AND OPTIMIZATIONS Dogra, Sajal Background Algorithmic optimizations which rely on dataflow perform differently when run on a Von Nuemann machine (control-driven sequential execution). I want to find out if algorithmic optimizations match real world architectural constraints. Another motivation for this project comes from a belief that VLIW is probably not an optimum choice for embedded processing. VLIW machines make sense only in special purpose ultra-high performance data streaming applications. Thus there is a genuine argument for DSPs to look more like GP computers. The general purpose computers are superscalar, with multiple issue, out-of-order execution and deeper pipelining to reduce the iteration period. The DSPs are VLIWs with a simpler hardware unit and require more compiler support. There has been a recent trend to introduce VLIW machines as GPPs (Itanium, Transmeta) which have been unsuccessful. Itanium has relaxed issue rules (EPIC), out-of-order memory support, which makes it akin to superscalar. PROPOSAL Interesting algorithmic optimizations like unfolding, loop transformation, loop invariant code motion and loop peeling would be studied with real world constraints of a processor architecture. By varying parameters like the branch prediction strategy, Issue Queue, Reorder buffer, LSQ, cache structure, these optimizations might have undesirable effects. The idea is to identify architectural bottlenecks in multimedia processing. Both VLIW and superscalar architectures would be compared. Do superscalar architectures outperform VLIW structures? I have performed a detailed analysis on the MediaBench benchmark suite. Characteristics like instruction mix, branch prediction accuracy, cache hit rate, and memory usage were considered. This information may be useful while designing embedded systems targeted at multimedia applications. The simulator has been built on Simplescalar, an execution-driven simulator modeling a detailed out-of-order processor based on Alpha ISA. I am also using Wattch, which builds on the simplescalar suite. Wattch is an architectural power simulation tool. Thus I would like to generate different performance metrics like energy-delay, energy-delay^2. The question to ask would be, are there different design choices when using power metrics versus that for a performance optimized one?