String (computing): Difference between revisions
imported>Ed Poor (headings) |
mNo edit summary |
||
(One intermediate revision by one other user not shown) | |||
Line 13: | Line 13: | ||
==One byte, or two?== | ==One byte, or two?== | ||
With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of [[Unicode]], many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the [[Java (programming language)|Java]] programming language (and many languages which run on the Java platform: | With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of [[Unicode]], many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the [[Java (programming language)|Java]] programming language (and many languages which run on the Java platform: JRuby, [[Scala (programming language)|Scala]], Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The [[Python (programming language)|Python]] programming language has a separate Unicode datatype. The [[Ruby (programming language)|Ruby]] language can support multibyte string encoding in later versions or by using extra libraries. | ||
==Conversion== | ==Conversion== | ||
Line 51: | Line 51: | ||
==References== | ==References== | ||
{{reflist|2}} | {{reflist|2}}[[Category:Suggestion Bot Tag]] |
Latest revision as of 16:01, 22 October 2024
In computer programming languages, a string is a data type which consists of a list of characters arranged together in sequence (like a string of pearls in a necklace).
How strings are stored
In some languages, a string is simply a list of characters with some convenient helper methods that make strings more like blocks of text.
A string seen as an array
As a list, many programming languages let you use array or list processing methods on strings - getting the n-th member of the list will return the n-th character in the string.
One byte, or two?
With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of Unicode, many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the Java programming language (and many languages which run on the Java platform: JRuby, Scala, Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The Python programming language has a separate Unicode datatype. The Ruby language can support multibyte string encoding in later versions or by using extra libraries.
Conversion
Strings can be implicitly or explicitly converted into other datatypes depending on the programming language. Consider the following statement:
print "My favourite number is " + 5
In many languages, the 5 literal will represent an integer. It will be automatically cast into a string '5' and appended to the prior string. Now consider the following:
if (10 == "10") { /* ... */ }
In some languages, this conditional will not be satisfied. The conditional is comparing an integer and a string, and the types do not match. But for many uses, this kind of matching is pedantic and unnecessary. If the string had been converted into an integer, it would be equal to the integer it is being compared with. Similarly, if the integer had been converted into a string, it would be equal to the string it is being compared with.
This kind of conversion is called implicit conversion, and some languages (Scala, for instance) allow one to describe how said implicit conversions happen by declaring implicit type conversion functions.
String manipulation
Some languages were developed for manipulating strings, such as awk
and Snobol
. String-handling capability will be found in more and more general-purpose programming languages, but especially "scripting" languages such as Perl
, PHP
, and Python programming language
.
Discussed abstractly, there are a number of common string operations, the details of which vary with the language
Operation | Parameters | Result |
---|---|---|
Concatenation | string1, string2 | string1string2 |
Substring | String1, integer1,[ integer2] | Integer1 characters of string1, starting at the first character unless integer2 is specified as a starting point |
Fields | String1, String2 (or character) | An array of strings taken from string1, which were separated by string2 |