In
computer science a 'byte' (pronounced ) is a unit of measurement of
information storage, most often consisting of eight
bits. In many
computer architectures it is a
unit of memory
addressing.
Originally, a byte was a small group of bits of a size convenient for data such as a single character from a Western
character set. Its size was generally determined by the number of possible characters in the supported character set and was
chosen to be a submultiple of the computer's
word size; historically, bytes have ranged from five to twelve bits.
The popularity of IBM's
System/360 architecture starting in the
1960s and the explosion of
microprocessors based on 8-bit
microprocessors in the
1980s has made eight bits by far the most common size for a byte. The term
octet is widely used as a more precise synonym where ambiguity is undesirable (for example, in
protocol definitions).
There has been considerable confusion about the meanings of
SI prefixes used with the word "byte", such as kilo- (k or K) and mega- (M), as shown in the chart ''Quantities of bytes''. Since computer memory comes in multiples of 2 rather than 10, the industry used binary estimates of the SI-prefixed quantities. Because of the confusion, a contract specifying a quantity of bytes must define what the prefixes mean in terms of the contract (i.e., the alternate binary equivalents or the actual decimal values, or a binary estimate based on the actual values).
A byte is one of the basic
integral data types in some
programming languages, especially
system programming languages.
Meanings
The word "byte" has numerous closely related meanings:
# A contiguous sequence of a ''fixed'' number of
bits (binary digits). The use of a byte to mean 8 bits has become nearly ubiquitous.
# A contiguous sequence of bits within a binary computer that comprises the ''smallest addressable sub-field'' of the computer's natural
word-size. That is, the smallest unit of binary data on which meaningful computation, or natural data boundaries, could be applied. For example, the
CDC 6000 series scientific mainframes divided their 60-bit floating-point words into 10 six-bit bytes. These bytes conveniently held
Hollerith data from punched cards, typically the upper-case alphabet and decimal digits. CDC also often referred to 12-bit quantities as bytes, each holding two 6-bit
display code characters, due to the 12-bit I/O architecture of the machine. The
PDP-10 used assembly instructions LDB and DPB to extract bytes — these operations survive today in
Common Lisp. Bytes of six, seven, or nine bits were used on some computers, for example within the 36-bit word of the
PDP-10. The
UNIVAC 1100/2200 series computers (now
Unisys) addressed in both 6-bit (
Fieldata) and 9-bit (
ASCII) modes within its 36-bit word.
History
The term 'byte' was coined by Dr. Werner Buchholz in July 1956, during the early design phase for the
IBM Stretch computer.
[1][2][3] Originally it was defined in instructions by a 4-bit byte-size field, allowing from one to sixteen bits (the production design reduced this to a 3-bit byte-size field, allowing from one to eight bits in a byte); typical I/O equipment of the period used six-bit units. A fixed eight-bit byte size was later adopted and promulgated as a standard by the
System/360. The term "byte" comes from "bite," as in the smallest amount of data a computer could "bite" at once. The spelling change not only reduced the chance of a "bite" being mistaken for a "bit," but also was consistent with the penchant of early computer scientists to make up words and change spellings. However, back in the
1960s, the luminaries at IBM Education Department in the UK were teaching that a bit was a Binary digIT and a byte was a BinarY TuplE (from n-tuple, i.e. [quin]tuple, [sex]tuple, [sep]tuple, [oc]tuple ...), turning "byte" into a
backronym. A byte was also often referred to as "an 8-bit byte", reinforcing the notion that it was a tuple of ''n'' bits, and that other sizes were possible.
# A contiguous sequence of binary bits in a serial data stream, such as in modem or satellite communications, or from a disk-drive head, which is the smallest meaningful unit of data. These bytes might include start bits, stop bits, or parity bits, and thus could vary from 7 to 12 bits to contain a single 7-bit ASCII code.
# A ''
datatype'' or synonym for a datatype in certain
programming languages.
C, for example, defines ''byte'' as "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C
char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values (signed or unsigned
char doesn't matter).
Java's primitive
byte data type is always defined as consisting of 8 bits and being a signed data type, holding values from -128 to 127.
Early microprocessors, such as
Intel 8008 (the direct predecessor of the 8080, and then
8086) could perform a small number of operations on
four bits, such as the DAA (decimal adjust) instruction, and the "half carry" flag, that were used to implement decimal arithmetic routines. These four-bit quantities were called "
nybbles," in homage to the then-common 8-bit "bytes."
Alternate words
Following "bit," "byte," and "nybble," there have been some analogical attempts to construct unambiguous terms for bit blocks of other sizes.
[4] All of these are strictly
jargon, not
techspeak, and not very common.
★ 2 bits: crumb, quad, quarter, tayste, tydbit
★ 4 bits:
nibble, nybble
★ 5 bits: nickle
★ 10 bits: deckle
★ 16 bits: playte, chawmp (on a 32-bit machine)
★ 18 bits: chawmp (on a 36-bit machine)
★ 32 bits: dynner, gawble (on a 32-bit machine)
★ 48 bits: gawble (under circumstances that remain obscure)
Abbreviation/Symbol
IEEE 1541 and
Metric-Interchange-Format specify "B" as the symbol for byte (e.g. MB means megabyte), whilst
IEC 60027 seems silent on the subject.
Furthermore, B means bel (see
decibel), another (logarithmic) unit used in the same field.
The use of B to stand for bel is consistent with the metric system convention that capitalized symbols are for units named after a person (in this case
Alexander Graham Bell); usage of a capital B to stand for byte is not consistent with this convention. The unit symbol "kb" with a lowercase "b" is also commonly understood to stand for "kilobyte."
IEEE 1541 specifies "b" as the symbol for
bit; however the
IEC 60027 and Metric-Interchange-Format specify "bit" (e.g. Mbit for megabit) for the symbol, achieving maximum disambiguation from byte.
"b" vs. "B" confusion seems to be common enough to have inspired the creation of a dedicated website
b is not B.
French-speaking countries sometimes use an uppercase "o" for "octet". This is not allowed in
SI because of the risk of confusion with the zero and the convention that capitals are reserved for unit names derived from proper names, e.g., A=
ampere, J=
joule; s=
second, m=
metre.
Lowercase "o" for "
octet" is a commonly used symbol in several non-English-speaking countries, and is also used with metric prefixes (for example, "ko" and "Mo").
Names for different units
The prefixes used for byte measurements are usually the same as the
SI prefixes used for other measurements, but have slightly different values. The former are based on powers of 1,024 (2
10), a convenient binary number, while the SI prefixes are based on powers of 1,000 (10
3), a convenient decimal number. The table below illustrates these differences. See
binary prefix for further discussion.
| Prefix | Name | SI Meaning | Binary meaning | Size difference |
|---|
| K or k | kilo | 103 = 10001 | 210 = 10241 | 2.40% |
| M | mega | 106 = 10002 | 220 = 10242 | 4.86% |
| G | giga | 109 = 10003 | 230 = 10243 | 7.37% |
| T | tera | 1012 = 10004 | 240 = 10244 | 9.95% |
| P | peta | 1015 = 10005 | 250 = 10245 | 12.59% |
| E | exa | 1018 = 10006 | 260 = 10246 | 15.29% |
Note that since 1998, the
IEC, then the
IEEE has normalized a new model describing
binary prefixes avoiding consumer confusion between bytes & bits:
| Prefix | Name |
|---|
| Kibi | binary kilo | 1 kibibyte (KiB) | 210 bytes | 1024 bytes |
| Mebi | binary mega | 1 Mebibyte (MiB) | 220 bytes | 1024 KiB |
| Gibi | binary giga | 1 Gibibyte (GiB) | 230 bytes | 1024 MiB |
| Tebi | binary tera | 1 Tebibyte (TiB) | 240 bytes | 1024 GiB |
| Pebi | binary peta | 1 Pebibyte (PiB) | 250 bytes | 1024 TiB |
| Exbi | binary exa | 1 Exbibyte (EiB) | 260 bytes | 1024 PiB |
Fractional
information is usually measured in
bits,
nibbles,
nats, or
bans, where the later two are used especially in the context of
information theory and not generally with computing in general.
See also
★
Bit
★
Word (computing)
Notes
1. Origins of the Term "BYTE" Bob Bemer, accessed 2007-08-12
2. TIMELINE OF THE IBM STRETCH/HARVEST ERA (1956-1961) computerhistory.org, '1956 July ... Werner Buchholz ... Werner's term "Byte" first popularized'
3. byte catb.org, 'coined by Werner Buchholz in 1956'
4. nybble reference.com sourced from Jargon File 4.2.0, accessed 2007-08-12