Hardware and software trade offs embedded systems
Micaela Serra and William B. The course was subscribed by students from both our own department and the Department of Electrical and Computer Engineering, and was received well enough to be immediately repeated the following term.
In this paper, we explain our motivation, describe the course, and report on hardware and software trade offs embedded systems results. The exposition is colloquial, as it reflects the script of the oral presentation at the conference itself.
The following are the major definitions which capture the essence of the area: Some of the definitions may seem to apply directly to any system design area. The difference, elaborated more below, is in the types of applications, and in the dynamic interaction of choices between hardware and software allocation to hardware and software trade offs embedded systems the final product.
Given a set of specified goals and an implementation technology, designers consider trade-offs in how hardware and software components work together. This leads to the need for more flexible design strategies where hardware and software designs proceed in parallel with much feedback and interaction. Decisions are evaluated on performance, programmability, area, power, development and manufacturing costs, reliability, maintenance, design evolution union of software and hardware.
Why is this topic important? Their introduction as alternative computational units and the flexibility they offer have required different methodologies in the design process, reflecting the large number of choices. Today's computing systems deliver increasingly higher performance to end users; thus we require architectural support for operating systems, or particular hardware to expedite application-specific software product evolution.
The new architectures based on programmable hardware circuits are usually geared to accelerate the execution of specific computations or emulate new hardware designs, before they are committed to a specific more expensive ASIC chip.
The key hardware and software trade offs embedded systems here is "reconfiguration": Figure 1 represents a more utopian view, where codesign and codesign tools provide an almost automatic framework for producing a balanced and optimized design from some initial high level specification. The goal of codesign tools and platform is not to push towards this kind of total automation; the designer interaction and continuous feedback is considered essential.
The main goal is instead to incorporate in the "black box of codesign tools" the support for shifting functionality and implementation between hardware and software, with effective and efficient evaluation. At the end of the process, either the prototypes or the final products are output, based on currently available platforms software compilers and commercial hardware synthesis tools.
Codesign as an area of research does not aim to reinvent the wheel of system design; however, the necessary flexibility must be effectively incorporated and enhanced.
For example, in the design of a real time system as a graduate project, a sub path in the figure above may indeed be followed. The difference is that the designers are given predetermined choices of hardware and software allocation and must meet the timing constraints within the specifications.
Codesign introduces the research into the trade-offs of that allocation, dynamically throughout the entire process. They are application specific systems which contain both hardware and software tailored for a particular task and are generally part of a larger system. Examples range from the automotive fields ABS brake controllers, almost any other monitoring subsystemthe portable communication or computing systems the Palm Pilot, any cellular phoneto the medical area as in portable heart monitors.
Embedded systems often fall under the category of reactive systems, that is, containing sensors or similar elements which hardware and software trade offs embedded systems interact with the external environment. The components used usually include a general purpose processor, one or more special purpose processors and some ASICs. For example, real-time systems are types of reactive systems which must meet some time constraints.
A major issue in an embedded system is to provide design approaches that scale up, without a total redesign for a legacy product. Finally the interdisciplinary aspect between computer science and computer engineering, and the intradisciplinary aspect within computer science poses a challenge to us as educators and pivots for effective technology transfer; how to provide senior and graduate students with a framework such that they can learn to work concurrently.
Hardware and software trade offs embedded systems the conventional design process, the hardware and software split of components is decided early, usually on ad hoc basis, creating what is commonly called a " Model Continuity Problem ".
Figure 2 shows graphically the two paths, leading to a final system integration, with no reconfiguration choices shown after the initial split.
Model continuity is important because: The consequences of losing such model continuity include: One of the labels given to some solution is based on the concept of a " Unified Design Environment ", as graphically shown in Figure 3, where it is emphasized that hardware design and software design use the same integrated infrastructure, resulting in an improvement of overall system performance, reliability, and cost effectiveness.
It is easy to draw such picture and assign grandiose labels. Yet here the triangles shown spanning the two paths and covering the integrated substrate do not refer to mere feedback sessions and hardware and software trade offs embedded systems designers meetings!
They represent, at a minimum, an integrated database, with powerful tools which can support the exploration, prototype implementation and rapid evaluation of the repartitioning of functionality between hardware and software, together with an essential and extremely effective cosimulation platform. However it is useful to list the main research areas, which are: Why can't CAD tools with successive simulation suffice? And why can't rapid prototyping systems suffice?
A few points are as follows: System design space is very large if not unlimited. The designer should be able to manipulate during the design: These boards usually run a real-time operating system.
Thus Rapid Prototyping may shorten the design path but does not answer basic questions, such as where do we divide the functionality: It is how the decision is made that is of research interest! Codesign tools allow the designer to avoid local maxima by enabling design space exploration. Given the emphasis placed on interaction and the need for reconfiguration during the whole of the design process, we can summarize in figure 4 the "ideal" process flow that codesign wants to support.
The red "interaction and feedback" arrow is the crucial part. Another important aspect is the central "Interface" hardware and software trade offs embedded systems, which in normal system design is often left on the sideline, causing disastrous effects at the time of integration. Given that many embedded systems which use codesign methodologies are often implemented at a very low level of programming and details e.
The codesign CAD tools of point 7 are distributed throughout the course with introductory labs and design assignments. The items in points 2, 3 and 6 represent a summary of hardware and VLSI design, which is much needed by both the computer science and, often, by the computer engineering students . The labels may look ambitious, but clearly only a survey of the topics is given. The difficulty is indeed in finding the necessary balance between depth and breadth.
The reward was that all students find these subjects particularly interesting and extremely "empowering", by giving them the confidence that they could hold their own in almost hardware and software trade offs embedded systems design environment by being able to grasp the essentials and know where to find more information of details for both system, software and hardware issues. The software tools we introduced prove to be a pivot for the students' experience, as they gave them direct exposure and the power to hardware and software trade offs embedded systems significant projects see below.
The first framework introduced is Ptolemy, distributed free from U. Berkeley and also available commercially. Ptolemy models systems with functional building blocks stars, galaxies and is ideal as a design platform at a high level, as it contains choices for many subdomains.
Ptolemy can also be useful for other courses not necessarily on codesign, but focused on more general design, as it provides a sturdy platform for models based on Petri nets, CSP, etc. It contains tools for simulation and cosynthesis, but it is weak on partitioning and reconfiguration.
A design is modeled using SpecCharts which supports concurrency, state transitions, hierarchy and synchronization; SpecSyn takes care of design space exploration and cosynthesis in a very effective manner. Thus it is well suited to both computer science and computer engineering students who find themselves very comfortable with a high level programming language. As an aside, VHDL is a new item for most students, and, notwithstanding any industrial or academic discussion as its advantages and disadvantages, it proved to be fascinating and powerful in its flexibility the behavioral, structural and dataflow paradigms that VHDL allows.
Last, but not least, programmable hardware, in the form of FPGAs is introduced in the course , but it is beyond the scope of this presentation. The use of FPGAs is the first step towards the research area called " Configware ", referring to dynamically reconfigurable subsystems. A short introduction to VHDL as a programming language is also given through one lecture and an assignment. Students and Research One major part of the course is presenting to students current research projects in the area developed locally and the evaluation of a student project of their choice.
Students can choose a design project or a literature search on a topic beyond the scope of the lectures. The following list shows a small selection of titles of the most interesting design projects to date. On the side of "technology transfer", three research projects were described to students: For each of these research topics, the papers and hardware and software trade offs embedded systems related literature were presented.
In summary, the following points became clear in the overall evaluation of the course itself, the impact on the students and curriculum, and the students' level of satisfaction.
The interdisciplinary aspect between Comp. Engineering and the intradisciplinary aspect within Comp. Science was somewhat new and a source of great strength under all views. The hardware related topics were tremendously empowering to the mainly software students in Comp. Science, who found the demistification of the whole area of VLSI design and CAD software useful to their breadth, especially towards technical jobs in smaller hardware and software trade offs embedded systems companies.
It is fun to see a hardware and software trade offs embedded systems system designed small, embedded and use state-of-the-art tools. Also it is fun to see all sides of the design process. It is fun to hear about how things circuits are built and fabricated take a few grains of sands and here is your chip!
The course required hardware and software trade offs embedded systems of skills - a true integration of Comp. Science streams, and a push towards breadth. The Web page related to the course is currently still available at http: Rapid-Prototyping Board Users Guide. De Micheli and M.
Kluwer Academic Publishers, Specification and Design of Embedded Systems. SmithApplication-Specific Integrated Circuits. A typical Rapid Prototyping system may consist of: Hardware acceleration for data compression - another personality inside a large system. A rowing Coach Assistant - a Palm Pilot for rowing coaches.
Higher performance with more bang for the buck is today's microprocessor game. We have the architectural expertise and technology to design radically new microprocessors, to craft new and sophisticated ISAs Instruction Set Architectures. Instead, the trend is to extend existing ISAs, giving performance boosts to current microprocessors.
These extensions would be so much unused silicon if not for assembler and compiler support. It is the development software that ensures effective employment of new ISA features. Thus ISA extensions appear as software libraries or compiler enhancements for programmer and compiler use. Cygnus Solutions provides development tools and compilers for over 30 architectures and around different microprocessors, a number of which incorporate ISA extensions.
ISA extensions combine the better of two worlds: For example if you have a project and you choose a 68K or a SPARC, you have a wide range of operating, development and application software available for use. New architectures need time to build up an equivalent software base. Extensions let microprocessor architects incrementally add new technology as it comes on line.
Some ISA extensions now emerging include:. However, it keeps the bit hardware and software trade offs embedded systems. This tactic cuts RISC code bloat, especially for code intensive applications but keeps bit data processing capabilities. The dynamic part is that programs can switch from the bit to bit ISAs as needed while running. The code is compiled for mixed or bit ISA operation. These include adding MAC multiply-accumulate instructions and other interactive, math-oriented processing.
Java Software Engine Java is a software extension to hardware. Java started out with a byte-code interpreter in its execution engine. Compiled versions of Java are emerging to deliver higher, C-level performance. The GNU tools are highly modular and designed to support a wide range of architectures and processors. It also shows where ISA extensions are added to the flow. ISA extensions do more than just add a "mess" of instructions and leave their use to assembly writer and compiler folk.
These new instructions are designed to work together, to support one another and to provide new capabilities to the existing ISA. For example, for DSP extensions, the new instructions were created to allow developers to set up and iterate through a DSP loop accumulating values for a series or for matrix or vector operating.
The "model" in this case includes setting up addressing pointers, setting loop values and running an iterative DSP processing loop. The key to efficient DSP operation is to minimize "bookkeeping" operations and run very tight inner loop code. The "computing model" provides the model of behavior that programmers can emulate to craft efficient code. The compiler is also structured to optimize that "computing model" or way of coding. This goes beyond the traditional compiler optimization, which is to optimize individual instructions or threads of instruction usage.
Computing models make the point that ISA extensions are more than a few new instructions. Instead, they are a collection of operations that collectively support a model of processing.
The closer library and application code follows those models, the more efficient code execution will be. To do this, to shrink bit instructions to bit instructions, the number of instructions and number of registers referenced are reduced. Smaller instruction word fields equal less instructions and less register resources. Less register resources means that the compiler must be register stingy and users should minimize defined register use.
The benefits from using ISA extensions and deploying them in assembled and compiled code is not trivial. These extensions can deliver a great boost in performance or usability for specific application classes. Has bit instruction set and registers with bit data path. Instead, however, it provides a much more controlled, structured object-oriented development and runtime environment.
These ISA extensions allow developers to stay with standard microprocessor architectures, yet get critical performance boosts or usability boosts for their applications. Many consider the 68K to set the standard for acceptable embedded system code density.
The 68K design team made a number of very effective architectural decisions, decisions that have proved themselves over time. The key design parameters for the 68K was that it had a bit data path originally bit registers with a bit ALU and bit memory interface and bit instructions.
Most instructions fit into bits with extensions for addressing and immediate data. This combination of bit instructions and bit data gave the 68K bit processing power with roughly bit ISA code density. To do this RISC designers minimized the instruction set -- the less instructions, the less logic depth, and thus faster execution.
The smaller instruction word has smaller fields. Thus, it can only execute a smaller set of instructions using a smaller register set since the OP code field and register fields need to reduce to create 16 bit instructions.
They came up with Hardware and software trade offs embedded systems, which took a very clever approach to running a bit ISA. The trick was to pick a bit subset of the bit ARM ISA, with less registers and instructions to reduce the field sizes. The decoded instructions are then passed through an expansion block, that expands the decode to the bit form, and the decoded bit instructions are then passed on to the next stage in the hardware and software trade offs embedded systems. The bit instructions are retrieved, decoded and then expanded to bit instructions for execution.
The MIPS takes a similar approach. Figure 2 hardware and software trade offs embedded systems the decode expansion block that expands the bit instructions to bitters.
Those have fixed bit ISAs and a bit datapath. A mode or status bit sets 16 or bit ISA mode. That bit cannot be set at any time during execution. Hardware and software trade offs embedded systems can only be set during a call or return, insuring that or bit ISA operation is defined at the function or subroutine level, or higher. They run compiled code and the compiler compiles or bit ISA code. A switch or flag is set, and that indicates whether to compile a function or file as or bit code.
The decision of what is or bit ISA code is made at compile time, not runtime. However, the generated code will set the proper ISA mode bit when switching from to bit code or vice versa. To fit into 16 bits, the MIPS uses 5-bits for its major op code field hardware and software trade offs embedded systems function field they were 6-bits3-bits for the register fields they were 5 bits.
And it shrinks the immediate value field from to 5-bits. The bit ISA's restricted register set directly effects compiler register allocation strategy. With the normal MIPS ISA, when the compiler allocates registers for a complex expression, it normally feels free to allocate a new register any time one is needed.
Usually there are enough registers, and if there are not enough, if too many registers are needed, some are merged or spilled to the stack. Other changes include adding a special stack pointer instead of hardware and software trade offs embedded systems a general register. The MIPS only has 8 general registers, too few to use on hardware and software trade offs embedded systems designated stack pointer.
JALX jumps and saves the next address in a link register and is used for subroutine calls. The JALR returns to the address in the link register. The ISA mode bit can also be changed when an exception occurs. Generally hardware and software trade offs embedded systems are handled by the system in full bit ISA mode. This becomes a problem since the MIPS calling convention is meant to pass floating-point values in the floating-point registers.
The MIPS code cannot access that value. We fixed the problem by making sure that the compiler emits a bit stub for any bit function that took a floating-point argument. The stub copies the floating-point register into regular argument registers, and that then jump to the bit function. The linker arranges for all calls from bit functions to go hardware and software trade offs embedded systems the bit stub.
If the bit stub is never called, the linker deletes it. When a bit function calls a function with a floating-point argument, the compiler adds a bit stub. The linker arranges the function call to go through the stub if calling a bit function; it will go directly to a bit function. Someone once noted that many applications manage to turn bit microprocessors into 8-bit processors.
Unfortunately, that's true for many standard programming tasks. These operations involve a bit processor searching for an 8-bit or character value, one byte at a time. Yes, you can speed it up by doing AND compares and testing on matches, but it still takes processing time.
Another processing sink for inefficient cycles is 8-bit or bit pixel processing on a bit or bit data path. Rarely can such processing use the full datapath bandwidth of the CPU. Instead it throttles down, processing to the graphic field size, and wastes datapath resources. Thus, for a bit system, one instruction can do eight 8-bit adds, compares, or logical AND s, for example.
That is an 8X speed up. Cygnus Solutions was the first to produce a commercial compiler for V9 processors. It can issue up to four instructions per pipeline cycle. It has four execution units: These operations, except for divide and square root, are fully pipelined.
They can complete out-of-order without stalling the execution of FGops. It also has 4 sets of FP condition code registers for more parallelism. Pixel information is typically stored as four hardware and software trade offs embedded systems or bit integer values. Typically these four values represent red, green, blue RGB and the alpha a information for the pixel.