In computer science, a cache is a collection of data
duplicating original values stored elsewhere or computed earlier, where the
original data is expensive to fetch (owing to longer access time) or to compute,
compared to the cost of reading the cache. In other words, a cache is a
temporary storage area where frequently accessed data can be stored for rapid
access. Once the data is stored in the cache, it can be used in the future by
accessing the cached copy rather than re-fetching or recomputing the original
data.
A cache has proven to be extremely effective in many areas of
computing because access patterns in typical computer applications have locality
of reference. There are several kinds of locality, but this article primarily
deals with data that are accessed close together in time (temporal locality).
The data might or might not be located physically close to each other (spatial
locality).
History
Use of the word cache in the computer context
originated in 1967 during preparation of an article for publication in the IBM
Systems Journal. The paper concerned an exciting memory improvement in Model 85,
a latecomer in the IBM System/360 product line. The Journal editor, Lyle R.
Johnson, pleaded for a more descriptive term than high-speed buffer. When none
was forthcoming, he suggested the noun cache, from the French noun meaning a
safekeeping or storage place . The paper was published in early 1968, the
authors were honored by IBM, their work was widely welcomed and subsequently
improved upon, and cache soon became standard usage in computer
literature.
Operation
A cache is a block of memory for temporary storage of data
likely to be used again. The
CPU
and hard drive frequently use a cache, as do web browsers and web servers.
A cache is made up of a pool of entries. Each entry has a
datum (a nugget of data) which is a copy of the datum in some backing store.
Each entry also has a tag, which specifies the identity of the datum in the
backing store of which the entry is a copy.
When the cache client (a CPU, web browser, operating system)
wishes to access a datum presumably in the backing store, it first checks the
cache. If an entry can be found with a tag matching that of the desired datum,
the datum in the entry is used instead. This situation is known as a cache hit.
So, for example, a web browser program might check its local cache on disk to
see if it has a local copy of the contents of a web page at a particular URL. In
this example, the URL is the tag, and the contents of the web page is the datum.
The percentage of accesses that result in cache hits is known as the hit rate or
hit ratio of the cache.
The alternative situation, when the cache is consulted and
found not to contain a datum with the desired tag, is known as a cache miss. The
previously uncached datum fetched from the backing store during miss handling is
usually copied into the cache, ready for the next access.
During a cache miss, the CPU usually ejects some other entry
in order to make room for the previously uncached datum. The heuristic used to
select the entry to eject is known as the replacement policy. One popular
replacement policy, least recently used (LRU), replaces the least recently used
entry (see cache algorithms). More efficient caches compute use frequency
against the size of the stored contents, as well as the latencies and
throughputs for both the cache and the backing store. While this works well for
larger amounts of data, long latencies, and slow throughputs, such as
experienced with a hard drive and the Internet, it's not efficient to use this
for cached main memory (RAM).
When a datum is written to the cache,
it must at some point be written to the backing store as well. The timing of
this write is controlled by what is known as the write policy.
In a write-through cache, every write to the cache causes a
synchronous write to the backing store.
Alternatively, in a write-back (or write-behind) cache,
writes are not immediately mirrored to the store. Instead, the cache tracks
which of its locations have been written over (these locations are marked
dirty). The data in these locations is written back to the backing store when
those data are evicted from the cache, an effect referred to as a lazy write.
For this reason, a read miss in a write-back cache (which requires a block to be
replaced by another) will often require two memory accesses to service: one to
retrieve the needed datum, and one to write replaced data from the cache to the
store.
Data write-back may be triggered by other policies as well.
The client may make many changes to a datum in the cache, and then explicitly
notify the cache to write back the datum.
No-write allocation is a cache policy where only processor
reads are cached, thus avoiding the need for write-back or write-through when
the old value of the datum was absent from the cache prior to the write.
The data in the backing store may be changed by entities
other than the cache, in which case the copy in the cache may become out-of-date
or stale. Alternatively, when the client updates the data in the cache, copies
of that data in other caches will become stale. Communication protocols between
the cache managers which keep the data consistent are known as coherency
protocols.
Applications
CPU cache
Small memories on or close to the
CPU
can be made faster than the much larger main memory. Most CPUs since the 1980s
have used one or more caches, and modern microprocessors inside personal
computers may have as many as half a dozen, each specialized to a different part
of the task of executing programs.
Disk cache
While CPU caches are generally managed entirely by hardware,
other caches are managed by a variety of software. The page cache in main
memory, which is an example of disk cache, is usually managed by the operating
system kernel.
While the hard drive's hardware disk buffer is sometimes
misleadingly referred to as "disk cache", its main functions are write
sequencing and read prefetching. Repeated cache hits are relatively rare, due to
the small size of the buffer in comparison to HDD's capacity.
In turn, fast local hard disk can be used to cache
information held on even slower data storage devices, such as remote servers
(web cache) or local tape drives or optical jukeboxes. Such a scheme is the main
concept of hierarchical storage management.
Web cache
Web caches are employed byweb browsers and web proxy servers
to store previous responses from web servers, such as web pages. Web caches
reduce the amount of information that needs to be transmitted across the
network, as information previously stored in the cache can often be re-used.
This reduces bandwidth and processing requirements of the web server, and helps
to improve responsiveness for users of the web.
Modern web browsers employ a built-in web cache, but some
internet service providers or organizations also use a caching proxy server,
which is a web cache that is shared between all users of that network.
Another form of cache isP2P caching, where the files most
sought for by Peer-to-peer applications are stored in an ISP cache to accelerate
P2P transfers.
Other Caches
The BIND DNS daemon caches a mapping of domain names to IP
addresses, as does a resolver library.
Write-through operation is common when operating over
unreliable networks (like an Ethernet LAN), because of the enormous complexity
of the coherency protocol required between multiple write-back caches when
communication is unreliable. For instance, web page caches and client-side
network file system caches (like those in NFS or SMB) are typically read-only or
write-through specifically to keep the network protocol simple and reliable.
Search engines also frequently make web pages they have indexed available from
their cache. For example, Google provides a "Cached" link next to each search
result. This is useful when web pages are temporarily inaccessible from a web
server.
Another type of caching is storing computed results that will
likely be needed again, or memoization. An example of this type of caching is
ccache, a program that caches the output of the compilation to speed up the
second-time compilation.
Database caching can substantially improve the throughput of database
applications, for example in the processing of indexes, data dictionaries, and
frequently used subsets of data.
The difference between buffer and cache
The terms are not mutually exclusive and the functions are
frequently combined; however, there is a difference in intent. A buffer is a
temporary memory location, that is traditionally used because CPU instructions
cannot directly address data stored in peripheral devices. Thus, addressable
memory is used as intermediate stage.
Additionally such a buffer may be feasible when a large block
of data is assembled or disassembled (as required by a storage device), or when
data may be delivered in a different order than that in which it is produced.
Also a whole buffer of data is usually transferred sequentially (for example to
hard disk), so buffering itself sometimes increases transfer performance. These
benefits are present even if the buffered data are written to the buffer once
and read from the buffer once.
A cache also increases transfer performance. A part of the
increase similarly comes from the possibility that multiple small transfers will
combine into one large block. But the main performance gain occurs because there
is a good chance that the same datum will be read from cache multiple times, or
that written data will soon be read. A cache's sole purpose is to reduce
accesses to the underlying slower storage. Cache is also usually an abstraction
layer that is designed to be invisible from the perspective of neighboring
layers.