Menu
  • HOME
  • TAGS

Write-back vs Write-Through

caching,computer-architecture

The benefit of write-through to main memory is that it simplifies the design of the computer system. With write-through, the main memory always has an up-to-date copy of the line. So when a read is done, main memory can always reply with the requested data. If write-back is used, sometimes...

Function to calculate a value inside a Verilog generate loop

verilog,computer-architecture,digital-logic

When I change to use a function I get the error Packed dimension must specify a range. I think you need to think about your partProds width and connections. Using a function: module bcd_mult_1_n #( parameter N = 8 ) ( input [N * 4 - 1:0] num1, input [N...

what is target architecture in computer science?

gcc,architecture,compiler-construction,computer-architecture

The target architecture is the architecture which the compiler creates binary files for. Common architectures are: i386 (Intel 32-bit), x86_64 (Intel 64-bit), armv7, arm64, etc... GCC compiles C code (after the preprocessing stage) to assembly code, and the assembly code varies depending on the CPU architecture. The assembly code is...

Computer Architecture/Assembly, Amdahl's Law

assembly,computer-science,computer-architecture

Maybe a picture helps: +----------------+----------+----------+----------+----------+ 1 core: | non-par. | parallelizable | | (1 - Q) | Q | +----------------+----------+----------+----------+----------+ | || | | +----------------+----------+ --+ n cores: | | | | +----------------+----------+ | | | | +----------+ +-- n times | | | +----------+ | | | | +----------+...

Defining locality in pseudocode

computer-architecture

Generally, given the code-snippet, one can't just determine easily about spatial locality unless the whole code is given. Temporal Locality refers to the reuse of specific data, and/or resources, within a relatively small time duration.. Whereas, Spatial Locality refers to the use of data elements within relatively close storage locations....

Shouldn't R3 hold address x3307?

assembly,cpu-registers,computer-architecture,machine-code,lc3

PC-relative offsets are applied on top of the already incremented PC, that is the "after" value of the PC, or in other words, the address of the following instruction.

Why doesn't MASM work on Mac OS X?

assembly,x86,operating-system,hardware,computer-architecture

Programs are made up of more than just the raw machine code. The executable needs to have a special format that the OS can understand, so it can load and run the code. Also, the code expects a certain environment, such as libraries and system calls (along with the appropriate...

Bit-wise operations on addresses

bit-manipulation,bit,computer-architecture

Think of it this way - a bit can only have 2 possible values i.e. 0 or 1. If you write down the binary representations, you will end up with something like the below: abcd efgh & 0011 1111 ----------- = 0010 0111 Going by the definition of bitwise AND,...

How many bits are in the address field for a directly mapped cache?

caching,system,cpu,computer-architecture,cpu-cache

After watching this Caching Video, I was able to figure this out. (Highly recommend this video) Please correct me if any of this is wrong In total the address has 64 bits. Now for the different components of the cache address Byte - There are 64 bytes in a byte...

Why do Computers use Hex Number System at assembly language?

assembly,architecture,computer-science,computer-architecture,hex

Well it doesn't make a difference how you represent them but as we know that humans don't understand binary numbers, they are only to make the computer's life easier as it works on only two states true and false. So in order to make binary numbers(instructions) human readable we adapted...

Computer architecture - How to find the addresses in a block

computer-science,computer-architecture

In your configuration a memory block is made of 16 words. Let me also assume that a word is 4 bytes and the memory is byte addressable. 1 Block = 16 words = 64 bytes Block numbers usually grow with memory addresses, that is: Block Address Range Block #0 [0,...

Cache and scratchpad memories

caching,arm,computer-architecture

A scratchpad is just that a place to keep some stuff. Cache, is memory you talk through normally not talk at. Scratchpad is like a post it note, something you write something on and keep with you. Cache is paper you send off to someone else with instructions like a...

Filling up Delayed Branch slots

assembly,computer-architecture

For (A), J X is an unconditional jump to a single fixed label (presumably within the range provided by the ISA), so it does not have any data dependencies on previous instructions. Since the ADD has a name dependence on the result of the OR, if it is to be...

How to disassemble a compiler generated code?

c++,debugging,gdb,computer-architecture,vtune

If rescheduling has been done by the compiler, you really should see that when disassembling in gdb. Otherwise you can perhaps use objdump directly on the command-line, that's my preferred way of seeing code in an ELF: $ objdump --disassemble a.out | less It doesn't reference the source at all,...

Why am I getting an “expected register or immediate value” error?

assembly,cpu-registers,computer-architecture,machine-code,lc3

You're getting this error because the LC3's state machine only has two versions of the ADD command. ADD R1, R2, R3 ADD R1, R2, #7 You can see that we can add registers together or we can use ADD immediate. ADD immediate is where we use the last operand as...

Direct-­mapped instruction cache VS fully associative instruction cache using LRU replacement

algorithm,caching,computer-architecture

This can happen for caches of small size. Let's compare caches of size 2. In my example, the directly-mapped "DM" cache will use row A for odd addresses, and row B for even addresses. The LRU cache will use the least recently used row to store values on a miss....

FPGA verilog code upload speed and size limit

fpga,computer-architecture,cpu-architecture

This would depend on the particular CPU, but to give you an idea a Xilinx MicroBlaze Soft Processor Core takes up around 1000 logic cells, maybe up to around 6000 logic cells with peripherals. A high end FPGA like the Xilinx Zynq-7100 has 444K logic cells. Configuring an FPGA is...

Instruction Execution

assembly,verilog,computer-architecture

you can also google "risc architecture". or refer to the wikipedia page on RISC DLX architecture. or refer to book by hennesey and patterson, which discusses in detail the risc architecture. normally you have a pipelined risc architecture (fetch - decode - execute - write back). instruction are executed...

I'm struggling with writing the truth table for this state diagram for jk flip flops

computer-architecture,flip-flop,state-diagram

The state transition diagram (STD) from your post is simply outlining the possible states, the outputs for each state and the transition conditions possible between the states. In the posted STD, there are 4 states, S0, S1, S2 and S3. That means the system can be in any of state...

Internal operations for a PC to copy data from an external USB drive to an internal HD? [on hold]

architecture,storage,hardware,computer-architecture

If this is homework question the following would suffice; else the following is missing too many details and you should not use it. When you plug in a USB drive, the USB driver presents the USB drive as a device that can read/write files. To copy files, the PC Operating...

minimum number of bits in a microinstruction to specify two kinds of microoperations?

computer-architecture

This question is not asked very clearly and had I received it on a test I would have asked for clarification, but this is how I read it; The first condition (None or any one out of 3 operations) gives 4 variations to consider. This uses 2 bits. The second...

Modified booth multiplication algorithm

verilog,computer-architecture

I think there's a problem with your method - I ran the following testbench on your code, which cycled thorugh 1-9 for both a and b. I'd suggest giving it a try: module tb(); reg clk; reg [7:0]a; reg [7:0]b; wire [15:0] p; integer i,j,k; multiply inst (p,a,b,clk); initial clk...

How many nand gates does a computer actually need to operate?

hardware,boolean-logic,computer-architecture

You can't just shop for nand gates, those are ones used for hobbies or other markets, nand gates are in your processor in your computer, and they're not individual component but directly lithographied on the die, they are a few nanometers in size and there are billons of them on...

Assembly 8088: 32-bit signed Multiplication with addition and bit manipulation using 16-bit registers

assembly,32-bit,computer-architecture,8086

multiplicand dd 0,9000008 ; 32bit multiplicand 64bit space You setup 64 bit space but your code modifies 128 bit space!! Also because of little endeanness you should swap the dwords of the multiplicand....

What are the performance and architectural differences between PCIe and QPI?

computer-architecture,low-latency,pci-e

The two have fairly different messaging types due to their different roles. QPI is directly concerned with implementing cache coherency via the MESIF protocol and NUMA via a distributed directory. PCIe has no such notions, although they share in common memory read and write and completion message types (see here...

Hazard Type - Computer Architecture

computer-architecture

This is a RAW hazard as you're Writing data into R2 & R3 and then Reading those values in the SUB command. So (R)ead (A)fter (W)rite...

Determining Architecture In Makefile.am using Automake

autotools,automake,computer-architecture

Selecting the CPU and OS is better handled in configure.ac, which has AC_CANONICAL_HOST which parses the output from uname and puts it in a standard format: configure.ac ... AC_CANONICAL_HOST WRAPPER_CPPFLAGS="" AS_CASE([$host_os], [linux*], [ WRAPPER_CPPFLAGS="$WRAPPER_CPPFLAGS -DLINUX" AS_CASE([$host_cpu], [x86_64], [ WRAPPER_CPPFLAGS="$WRAPPER_CPPFLAGS -DAMD64 -I../../../external/xerces-c-3.1.1-x86_64-linux-gcc-3.4/include" ], [i?86], [ WRAPPER_CPPFLAGS="$WRAPPER_CPPFLAGS -I../../../external/xerces-c-3.1.1-x86-linux-gcc-3.4/include" ]) ],...

How does LEA instruction store address of A?

assembly,load,cpu-registers,computer-architecture,lc3

Anytime you see a PCoffset as an opcode operand LEA R2, A ; Loads the memory location of A into R2 ; LEA, DR, PCoffset9 It's telling you that when your code is assembled it doesn't actually place the label 'A' into your LEA command. Take the following code: .ORIG...

PCI BAR memory addresses

memory,computer-architecture,bios,pci

The OSDev site is ok. They describe memory/IO BARS from PCI device perspective, not from host perspective. So what OSDev is saying that memory BARs can be (but not necessarily are) mapped to physical RAM on PCI device. While IO BARs are usually something else (registers, FIFO, whatever). Please also...

Understanding Direct Mapped Cache

caching,memory,memory-management,computer-architecture,cpu-cache

You have the right idea, but your numbers went wrong somewhere. In your example you have a direct-mapped cache of 4 blocks/lines of 16 bytes/cells each. The address 1100100111 will be divided up as follows. You use the least significant four bits 0111 as the offset because it refers to...

Which are the operands in Lc3 instruction?

assembly,cpu,cpu-registers,computer-architecture,lc3

"operands (who the computer is expected to do it to)." - pg116 "The LD instruction requires two operands." - pg180 The LD command requires both a register and a label. The label is technically a PCoffset9. "operands can be obtained from registers, from memory, or they may be literal" -...

How many words can be in the address space?

64bit,cpu,memory-address,computer-architecture,processor

Thanks to aruisdante's comment, I was able to figure this out. Basically 64 bit addresses means there are 2 ^ 64 total addresses. Because byte addressable memory is used here, each address will store one byte. This means that in total, in the address space, 2 ^ 64 bytes can...

Number of Prime Implicant and EPI

computer-architecture,digital-logic,vlsi,digital-design

In order to provide you more of a learning opportunity, I will do the process for determining PI and EPI graphically on another function, similar to yours. You can use the exact same method to then solve the numbers for function you gave in your question. Note, there are multiple...

HACK Machines and its assembler

assembly,computer-architecture

I've have never heard of HACK machine before, so I looked for it on Google and found the official page, read a couple of PDFs and make a simple test. All this just to say: I have 5 min preparation on this subject :) Virtual Register are just symbols, names...

How do i calculate the size of a tag field?

computer-architecture,cpu-cache

You need 32 bits for the address. You need 6 bits for the offset within a block. You need 10 bits to identify one of the 1,024 possible blocks in the cache. That's 16 bits in total. Therefore the tag needs to be 32 bits - 16 bits = 16...

VGA and integrated graphics theory

memory,graphics,intel,computer-architecture,vga

it has been quite a few years I wrote something directly for VGA so take that in mind. The old legacy stuff (CGA/EGA,VGA) mapped all VRAM memory access to two segments only (2 x 64KByte) 1.graphic modes A000:0000 - A000:FFFF 2.text modes B800:0000 - B800:FFFF So these two 64KByte chunks...

Why is the adressspace increased from 32 to 36 bits with PAE

paging,computer-architecture,ia-32,page-tables

The physical address space can be 36 bits. The linear address space is always 32 bits in IA-32. Its achieved by increasing the size of the page directory pointer table entries, page directory entries and page table entries. They are all 64 bits with PAE paging enabled.And actually with PAE...

For a Single Cycle CPU How Much Energy Required For Execution Of ADD Command

computer-architecture,cpu-architecture,energy

OK, this is was going to be a long answer, so long that I may write an article about it instead. Strangely enough, I've been working on experiments that are closely related to your question -- determining performance per watt for a modern processor. As Paul and Sneftel indicated, it's...

Data locality relevance with The Machine and memristors?

caching,optimization,computer-architecture,scientific-computing,supercomputers

Memristors will be used as a replacement for SRAM cells. Even though they might increase the memory density/area and bring along improvements in power efficiency, I do not see them changing the concept of data locality as that is an abstract concept. Yes, it will lead to an increase in...

BCD adder and Decimal Output

computer-science,computer-architecture,digit,bcd,vlsi

A Mod 4 in the original equation gives the two least significant bits of the BCD digit A. So, let's consider the two most and least significant bits of A separately, like so: X = A AND 3 = A Mod 4 (two least significant bits) Y = A AND...

Is there a code that results in 50% branch prediction miss?

c++,c,performance,compiler-optimization,computer-architecture

If you know how the branch predictor works you can get to 100% misprediction. Just take the expected prediction of the predictor each time and do the opposite. The problem is that we don't know how it is implemented. I have read that typical predictors are able to predict patters...

Interpretation of perf stat output

performance,caching,optimization,computer-architecture,perf

The first thing you want to make sure is that no other computational process is running on your machine. That's a server CPU so I thought that could be a problem. If you use multi-threading in your program, and you distribute equal amount of work between threads, you might be...

hardware implementation of Modulo m adder

verilog,fpga,system-verilog,computer-architecture

Do not mix blocking and non-blocking assignments in the same always block. sum3e variable depends on sum3a and sum3b but at the same time sum3a and sum3b value is changing because of non-blocking assignments,This will results in logical errors.

Integer instructions can go past branches.what does this mean?

computer-architecture

I believe this is referring to the concept in some processors the pipeline causes instructions after a branch to execute. In a sequence like this: MOVL R10, R9 BNEQ SOMEHWERE ADDL3 R1, R2, R3 The add instruction gets executed regardless of the outcome of the test and branch in the...

cache reads and writes

caching,memory,computer-architecture

Wikipedia explains it quite well, actually. On one hand write-back vs. write-through defines when the data is written to the backing store (aka main memory): Write-through – write is done synchronously both to the cache and to the backing store. Write-back (or write-behind) – initially, writing is done only to...

Why bits are numbered from right to left?

assembly,computer-science,computer-architecture,bits

Bits are not numbered from right to left. They are numbered from lowest weight (the lowest weight bit getting the number 0 or 1 depending on the convention chosen) to highest weight (which can be 15 or 16, 31 or 32, 63 or 64, ...). One reason to number them...

Encoder and My Challenges on Digital Logic

computer-architecture,encoder,digital-logic,vlsi,digital-design

At initial state: Q = 0, D1D0 = 00, Q' = 1, JK = 00 encoder input:0001 After 1st clock pulse, D1D0 = 01 encoder input:0011 so JK =01 resets output Q = 0, Q' =1 After 2nd clock pulse, D1D0 = 10 encoder input:0101 So JK =10 sets output...

Software Breakpoints and Modern OOOE Processors

debugging,operating-system,computer-architecture

That is why all the ordered events such as interrupts, faults, exceptions, etc, are always handled at the commit point in out-of-order processors, where the original program order is restored and the correct machine state can be captured. This means that you may know of a pending event but still...

How does the cpu decide which data it puts in what memory (ram, cache, registers)?

memory,cpu,computer-architecture

The answer to this question is an entire course in itself! A very brief summary of what (usually) happens is that: You, the programmer, specify what goes in RAM. Well, the compiler does it on your behalf, but you're in control of this by how you declare your variables. Whenever...

Are cache-line-ping-pong and false sharing the same?

caching,multicore,computer-architecture,processor,false-sharing

Summary: False sharing and cache-line ping-ponging are related but not the same thing. False sharing can cause cache-line ping-ponging, but it is not the only possible cause since cache-line ping-ponging can also be caused by true sharing. Details: False sharing False sharing occurs when different threads have data that is...

Index register in cpu (Computer org. and arc.)

indexing,cpu,cpu-registers,computer-architecture,cpu-architecture

A register can hold any value that fits in the number of bits it has. What makes the value negative or not is the way you treat it. The question you should be asking yourself is - does your basic CPU support signed arithmetic operations, and how does it encode...

Cache calculating block offset and index

caching,computer-architecture,addressing

The block offset is simply calculated as log2 cache_line_size. The reason is that all system that I know of are byte addressable. So you need enough bits to index any byte in the block. Although most systems have a word size that is larger than a single byte, they still...

Filling x86_64 Pointers Top Sixteen Bits With Tag Data?

x86,cpu,x86-64,computer-architecture

No, you cannot. The top 16 bits are currently required to all be the same (e.g, 0x0000… or 0xffff…) — addresses which do not fit this pattern will always cause a fault. Future revisions may have "real" address space in this range, so it's not safe to use these bits for...

What could go wrong or why could software stop working if the system where the software is running is updated to a faster system?

upgrade,computer-architecture

One possible way to have software break on better hardware would be if there is a race condition bug. On slower hardware it might never come up because the hardware conditions make it just slow enough, but on faster hardware 2 threads of a program could inadvertently end up competing...

Computer Architecture: Speedup

performance,equation,computer-architecture,parallelism-amdahl

Suppose current execution time is 100 seconds. Desired speedup is 6/5, so that means the new time over the old time should be 5/6, a reduction of 16.67% or 16.67 seconds. (That's all Amdahl's law is!) You know that 20 seconds are spent in memory access, 50 seconds are spent...

What does stripping off the ASCII template mean?

assembly,ascii,cpu-registers,computer-architecture,lc3

As you noticed, operations are performed on ASCII values of entered characters. In assembly, if you read characters from keyboard, you really get their ASCII values in registers, so let's say you enter 2 and 3 and want to add them, then you are really adding 50+51. You have to...

Memory capacity of a RAM [closed]

computer-architecture,microprocessors

Your solution is incorrect. The question is what the capacity is, not the number of addressable locations. Your answer should be measured in units of storage (e.g, bits, bytes, or their multiples). Since this is clearly a homework problem, I'm not going to give an exact answer. But I will...

Is Translation Lookaside Buffer (TLB) the same level as L1 cache to CPU? So, Can I overlap virtual address translation with the L1 cache access?

caching,memory,computer-architecture,tlb

Yes, that's the whole point of a VIPT cache. Since the virtual addresses and physical one match over the lower bits (the page offset is the same), you don't need to translate them. Most VIPT caches are built around this (note that this limits the number of sets you can...

Cache memory organization

c,caching,computer-architecture

I assume by cache memory size of 512 you mean 512 byte. And let's assume cache is empty and has nothing in it that is mapped to a[] or b[] region. If your code runs with write back, then the first loop will be fully cache-missed. The third loop will...

Is it possible to omit rounding of intermediate results during arithmetic operation on multiple FP operands?

floating-point,computer-science,computer-architecture,floating-point-precision

According to the paper referenced in the question, it is possible to calculate the dot product of a pair of length N vectors with a single rounding operation at the end, getting the closest representable result to the dot product. In practice, current computers round intermediate results, which often results...

Where is -32768 coming from?

assembly,load,cpu-registers,computer-architecture,lc3

In KBSR (keyboard status register) bit 15 is set when a key is read, so you get 0b1000_0000_0000_0000 (or 0x8000) ... which - interpreted as a 2's complemented negative number - happens to be -32768 decimal.

what situations when to read data out of kernel space to user space?

operating-system,gpu,gpgpu,computer-architecture

The operating system's job is allow a lot of components, both hardware and software, to play nice with each other. In general, userland programs can't directly manipulate peripherals nor interfere with each other. I'm not familiar with the specific setup that you're citing, but it doesn't sound unusual. The USB...

Apple LLVM 6.0 Error After Changing Architectures

ios,xcode,open-source,computer-architecture,llvm-clang

I was trying to compile ARM code as if it were ARM64. Since it's just an app and not an extension, I don't need to do that. So, under Project>Target>Build Settings> Architectures I changed the Architectures key I standard architectures, and the Valid Architectures key to armv7.

Why Do Computers Use the Binary Number System (0,1)?

architecture,binary,hardware,ternary-operator,computer-architecture

This is a classic example of software thinking in a hardware world :) Oh my. Am I the only one who remembers vacuum tubes, or valves as we used to call them? Logic DIDN'T start with transistors, friends. The first computer (ENIAC) used lots of tubes, diodes and relays. As...