Tuesday, March 20, 2007

Computer storage

From Wikipedia, the free encyclopedia

Jump to: navigation, search
1 GiB of SDRAM mounted in a personal computer
1 GiB of SDRAM mounted in a personal computer

Computer storage, computer memory, and often casually memory refer to computer components, devices and recording media that retain data for some interval of time. Computer storage provides one of the core functions of the modern computer, that of information retention. It is one of the fundamental components of all modern computers, and coupled with a central processing unit (CPU), implements the basic Von Neumann computer model used since the 1940s.

In contemporary usage, memory usually refers to a form of solid state storage known as random access memory (RAM) and sometimes other forms of fast but temporary storage. Similarly, storage more commonly refers to mass storage - optical discs, forms of magnetic storage like hard disks, and other types of storage which are slower than RAM, but of a more permanent nature. These contemporary distinctions are helpful, because they are also fundamental to the architecture of computers in general. As well, they reflect an important and significant technical difference between memory and mass storage devices, which has been blurred by the historical usage of the terms "main storage" (and sometimes "primary storage") for random access memory, and "secondary storage" for mass storage devices. This is explained in the following sections, in which the traditional "storage" terms are used as sub-headings for convenience.

Contents

[hide]

[edit] Purposes of storage

The fundamental components of a general-purpose computer are arithmetic and logic unit, control circuitry, storage space, and input/output devices. If storage was removed, the device we had would be a simple digital signal processing device (e.g. calculator, media player) instead of a computer. The ability to store instructions that form a computer program, and the information that the instructions manipulate is what makes stored program architecture computers versatile.

A digital computer represents information using the binary numeral system. Text, numbers, pictures, audio, and nearly any other form of information can be converted into a string of bits, or binary digits, each of which has a value of 1 or 0. The most common unit of storage is the byte, equal to 8 bits. A piece of information can be manipulated by any computer whose storage space is large enough to accommodate the corresponding data, or the binary representation of the piece of information. For example, a computer with a storage space of eight million bits, or one megabyte, could be used to edit a small novel.

Various forms of storage, divided according to their distance from the central processing unit. Additionally, common technology and capacity found in home computers of 2005 is indicated next to some items.
Various forms of storage, divided according to their distance from the central processing unit. Additionally, common technology and capacity found in home computers of 2005 is indicated next to some items.

Various forms of storage, based on various natural phenomena, have been invented. So far, no practical universal storage medium exists, and all forms of storage have some drawbacks. Therefore a computer system usually contains several kinds of storage, each with an individual purpose, as shown in the diagram.

[edit] Primary storage

Primary storage is directly connected to the central processing unit of the computer. It must be present for the CPU to function correctly, just as in a biological analogy the lungs must be present (for oxygen storage) for the heart to function (to pump and oxygenate the blood). As shown in the diagram, primary storage typically consists of three kinds of storage:

  • Processor registers are internal to the central processing unit. Registers contain information that the arithmetic and logic unit needs to carry out the current instruction. They are technically the fastest of all forms of computer storage, being switching transistors integrated on the CPU's silicon chip, and functioning as electronic "flip-flops".
  • Cache memory is a special type of internal memory used by many central processing units to increase their performance or "throughput". Some of the information in the main memory is duplicated in the cache memory, which is slightly slower but of much greater capacity than the processor registers, and faster but much smaller than main memory. Multi-level cache memory is also commonly used—"primary cache" being smallest, fastest and closest to the processing device; "secondary cache" being larger and slower, but still faster and much smaller than main memory.
  • Main memory contains the programs that are currently being run and the data the programs are operating on. In modern computers, the main memory is the electronic solid-state random access memory. It is directly connected to the CPU via a "memory bus" (shown in the diagram) and a "data bus". The arithmetic and logic unit can very quickly transfer information between a processor register and locations in main storage, also known as a "memory addresses". The memory bus is also called an address bus or front side bus and both busses are high-speed digital "superhighways". Access methods and speed are two of the fundamental technical differences between memory and mass storage devices. (Note that all memory sizes and storage capacities shown in the diagram will inevitably be exceeded with advances in technology over time.)

[edit] Secondary and off-line storage

Secondary storage requires the computer to use its input/output channels to access the information, and is used for long-term storage of persistent information. However most computer operating systems also use secondary storage devices as virtual memory - to artificially increase the apparent amount of main memory in the computer. Secondary storage is also known as "mass storage", as shown in the diagram above. Secondary or mass storage is typically of much greater capacity than primary storage (main memory), but it is also much slower. In modern computers, hard disks are usually used for mass storage. The time taken to access a given byte of information stored on a hard disk is typically a few thousandths of a second, or milliseconds. By contrast, the time taken to access a given byte of information stored in random access memory is measured in thousand-millionths of a second, or nanoseconds. This illustrates the very significant speed difference which distinguishes solid-state memory from rotating magnetic storage devices: hard disks are typically about a million times slower than memory. Rotating optical storage devices, such as CD and DVD drives, are typically even slower than hard disks, although their access speeds are likely to improve with advances in technology. Therefore, the use of virtual memory, which is millions of times slower than "real" memory, significantly degrades the performance of any computer. Virtual memory is implemented by many operating systems using terms like swap file or "cache file". The main historical advantage of virtual memory was that it was much less expensive than real memory. That advantage is less relevant today, yet surprisingly most operating systems continue to implement it, despite the significant performance penalties.

Off-line storage is a system where the storage medium can be easily removed from the storage device. Off-line storage is used for data transfer and archival purposes. In modern computers, CDs, DVDs, memory cards, flash memory devices including "USB drives", floppy disks, Zip disks and magnetic tapes are commonly used for off-line mass storage purposes. "Hot-pluggable" USB hard disks are also available. Off-line storage devices used in the past include punched cards, microforms, and removable Winchester disk drums.

[edit] Tertiary and database storage

Tertiary storage is a system where a robotic arm will "mount" (connect) or "dismount" off-line mass storage media (see the next item) according to the computer operating system's demands. Tertiary storage is used in the realms of enterprise storage and scientific computing on large computer systems and business computer networks, and is something a typical personal computer user never sees firsthand.

Database storage is a system where information in computers is stored in large databases, data banks, data warehouses, or data vaults. It involves packing and storing large amounts of storage devices throughout a series of shelves in a room, usually an office, all linked together. The information in database storage systems can be accessed by a supercomputer, mainframe computer, or personal computer. Databases, data banks, and data warehouses, etc, can only be accessed by authorized users.

[edit] Network storage

Network storage is any type of computer storage that involves accessing information over a computer network. Network storage arguably allows to centralize the information management in an organization, and to reduce the duplication of information. Network storage includes:

  • Network-attached storage is secondary or tertiary storage attached to a computer which another computer can access at file level over a local-area network, a private wide-area network, or in the case of online file storage, over the Internet.
  • Storage area network provides other computers with storage capacity over a network, the crucial difference between network-attached storage (NAS) and storage area Networks (SAN) is the former presents and manages file systems to client computers, whilst a SAN provides access to disks at block addressing level, leaving it to attaching systems to manage data or file systems within the provided capacity.
  • Network computers are computers that do not contain internal secondary storage devices. Instead, documents and other data are stored on a network-attached storage.

Confusingly, these terms are sometimes used differently. Primary storage can be used to refer to local random-access disk storage, which should properly be called secondary storage. If this type of storage is called primary storage, then the term secondary storage would refer to offline, sequential-access storage like tape media.

[edit] Characteristics of storage

The division to primary, secondary, tertiary and off-line storage is based on memory hierarchy, or distance from the central processing unit. There are also other ways to characterize various types of storage.

[edit] Volatility of information

  • Volatile memory requires constant power to maintain the stored information. Volatile memory is typically used only for primary storage. (Primary storage is not necessarily volatile, even though today's most cost-effective primary storage technologies are. Non-volatile technologies have been widely used for primary storage in the past and may again be in the future.)
  • Non-volatile memory will retain the stored information even if it is not constantly supplied with electric power. It is suitable for long-term storage of information, and therefore used for secondary, tertiary, and off-line storage.
  • Dynamic memory is volatile memory which also requires that stored information is periodically refreshed, or read and rewritten without modifications.

[edit] Ability to access non-contiguous information

  • Random access means that any location in storage can be accessed at any moment in the same, usually small, amount of time. This makes random access memory well suited for primary storage.
  • Sequential access means that the accessing a piece of information will take a varying amount of time, depending on which piece of information was accessed last. The device may need to seek (e.g. to position the read/write head correctly), or cycle (e.g. to wait for the correct location in a revolving medium to appear below the read/write head).

[edit] Ability to change information

  • Read/write storage, or mutable storage, allows information to be overwritten at any time. A computer without some amount of read/write storage for primary storage purposes would be useless for many tasks. Modern computers typically use read/write storage also for secondary storage.
  • Read only storage retains the information stored at the time of manufacture, and write once storage (WORM) allows the information to be written only once at some point after manufacture. These are called immutable storage. Immutable storage is used for tertiary and off-line storage. Examples include CD-R.
  • Slow write, fast read storage is read/write storage which allows information to be overwritten multiple times, but with the write operation being much slower than the read operation. Examples include CD-RW.

[edit] Addressability of information

  • In location-addressable storage, each individually accessible unit of information in storage is selected with its numerical memory address. In modern computers, location-addressable storage usually limits to primary storage, accessed internally by computer programs, since location-addressability is very efficient, but burdensome for humans.
  • In file system storage, information is divided into files of variable length, and a particular file is selected with human-readable directory and file names. The underlying device is still location-addressable, but the operating system of a computer provides the file system abstraction to make the operation more understandable. In modern computers, secondary, tertiary and off-line storage use file systems.
  • In content-addressable storage, each individually accessible unit of information is selected with a hash value, or a short identifier with number? pertaining to the memory address the information is stored on. Content-addressable storage can be implemented using software (computer program) or hardware (computer device), with hardware being faster but more expensive option.

[edit] Capacity and performance

  • Storage capacity is the total amount of stored information that a storage device or medium can hold. It is expressed as a quantity of bits or bytes (e.g. 10.4 megabytes).
  • Storage density refers to the compactness of stored information. It is the storage capacity of a medium divided with a unit of length, area or volume (e.g. 1.2 megabytes per square centimeter).
  • Latency is the time it takes to access a particular location in storage. The relevant unit of measurement is typically nanosecond for primary storage, millisecond for secondary storage, and second for tertiary storage. It may make sense to separate read latency and write latency, and in case of sequential access storage, minimum, maximum and average latency.
  • Throughput is the rate at which information can read from or written to the storage. In computer storage, throughput is usually expressed in terms of megabytes per second or MB/s, though bit rate may also be used. As with latency, read rate and write rate may need to be differentiated.

[edit] Technologies, devices and media

[edit] Magnetic storage

Magnetic storage uses different patterns of magnetization on a magnetically coated surface to store information. Magnetic storage is non-volatile. The information is accessed using one or more read/write heads. Since the read/write head only covers a part of the surface, magnetic storage is sequential access and must seek, cycle or both. In modern computers, the magnetic surface will take these forms:

In early computers, magnetic storage was also used for primary storage in a form of magnetic drum, or core memory, core rope memory, thin film memory, twistor memory or bubble memory. Also unlike today, magnetic tape was often used for secondary storage.

[edit] Semiconductor storage

Semiconductor memory uses semiconductor-based integrated circuits to store information. A semiconductor memory chip may contain millions of tiny transistors or capacitors. Both volatile and non-volatile forms of semiconductor memory exist. In modern computers, primary storage almost exclusively consists of dynamic volatile semiconductor memory or dynamic random access memory. Since the turn of the century, a type of non-volatile semiconductor memory known as flash memory has steadily gained share as off-line storage for home computers. Non-volatile semiconductor memory is also used for secondary storage in various advanced electronic devices and specialized computers.

[edit] Optical disc storage

Optical disc storage uses tiny pits etched on the surface of a circular disc to store information, and reads this information by illuminating the surface with a laser diode and observing the reflection. Optical disc storage is non-volatile and sequential access. The following forms are currently in common use:

The following form have also been proposed:

[edit] Magneto-optical disc storage

Magneto-optical disc storage is optical disc storage where the magnetic state on a ferromagnetic surface stores information. The information is read optically and written by combining magnetic and optical methods. Magneto-optical disc storage is non-volatile, sequential access, slow write, fast read storage used for tertiary and off-line storage.

[edit] Ultra Density Optical disc storage

Ultra Density Optical disc storage An Ultra Density Optical disc or UDO is a 5.25" ISO cartridge optical disc encased in a dust-proof caddy which can store up to 30 GB of data. Utilising a design based on a magneto-optical disc, but utilising phase change technology combined with a blue violet laser, a UDO disc can store substantially more data than a magneto-optical disc or MO, because of the shorter wavelength (405 nm) of the blue-violet laser employed. MOs use a 650-nm-wavelength red laser. Because its beam width is shorter when burning to a disc than a red-laser for MO, a blue-violet laser allows more information to be stored digitally in the same amount of space.

Current generations of UDO store up to 30 GB, but 60 GB and 120 GB versions of UDO are in development and are expected to arrive sometime in 2007 and beyond, though up to 500 GB has been speculated as a possibility for UDO. [1]

[edit] Optical Jukebox storage

Optical jukebox storage is a robotic storage device that utilizes optical disk device and can automatically load and unload optical disks and provide terabytes of near-line information. The devices are often called optical disk libraries, robotic drives, or autochangers. Jukebox devices may have up to 1,000 slots for disks, and usually have a picking device that traverses the slots and drives. The arrangement of the slots and picking devices affects performance, depending on the space between a disk and the picking device. Seek times and transfer rates vary depending upon the optical technology. Jukeboxes are used in high-capacity archive storage environments such as imaging, medical, and video. HSM is a strategy that moves little-used or unused files from fast magnetic storage to optical jukebox devices in a process called migration. If the files are needed, they are migrated back to magnetic disk.

[edit] Other early methods

Paper tape and punch cards have been used to store information for automatic processing since the 1890s, long before general-purpose computers existed. Information was recorded by punching holes into the paper or cardboard medium, and was read by electrically (or, later, optically) sensing whether a particular location on the medium was solid or contained a hole.

Williams tube used a cathode ray tube, and Selectron tube used a large vacuum tube to store information. These primary storage devices were short-lived in the market, since Williams tube was unreliable and Selectron tube was expensive.

Delay line memory used sound waves in a substance such as mercury to store information. Delay line memory was dynamic volatile, cycle sequential read/write storage, and was used for primary storage.

[edit] Other proposed methods

Phase-change memory uses different mechanical phases of phase change material to store information, and reads the information by observing the varying electric resistance of the material. Phase-change memory would be non-volatile, random access read/write storage, and might be used for primary, secondary and off-line storage.

Holographic storage stores information optically inside crystals or photopolymers. Holographic storage can utilize the whole volume of the storage medium, unlike optical disc storage which is limited to a small number of surface layers. Holographic storage would be non-volatile, sequential access, and either write once or read/write storage. It might be used for secondary and off-line storage.

Molecular memory stores information in polymers that can store electric charge. Molecular memory might be especially suited for primary storage.

[edit] Primary storage topics

[edit] Secondary, tertiary and off-line storage topics

[edit] Data storage conferences

[edit] References

No comments: