String (computing): Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Ed Poor
m (spelling)
imported>Ed Poor
(headings)
Line 1: Line 1:
{{subpages}}
{{subpages}}


In computer [[programming languages]], a '''string''' is a data type which consists of a list of characters arranged together in sequence (like a string of pearls in a necklace). In some languages, a string is simply a list of characters with some convenient helper methods that make strings more like blocks of text. As a list, many programming languages let you use array or list processing methods on strings - getting the n-th member of the list will return the n-th character in the string.
In computer [[programming languages]], a '''string''' is a data type which consists of a list of characters arranged together in sequence (like a string of pearls in a necklace).  
 
==How strings are stored==
 
In some languages, a string is simply a list of characters with some convenient helper methods that make strings more like blocks of text.  
 
==A string seen as an array==
 
As a list, many programming languages let you use array or list processing methods on strings - getting the n-th member of the list will return the n-th character in the string.
 
==One byte, or two?==


With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of [[Unicode]], many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the [[Java (programming language)|Java]] programming language (and many languages which run on the Java platform: [[JRuby]], [[Scala (programming language)|Scala]], Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The [[Python (programming language)|Python]] programming language has a separate Unicode datatype. The [[Ruby (programming language)|Ruby]] language can support multibyte string encoding in later versions or by using extra libraries.
With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of [[Unicode]], many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the [[Java (programming language)|Java]] programming language (and many languages which run on the Java platform: [[JRuby]], [[Scala (programming language)|Scala]], Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The [[Python (programming language)|Python]] programming language has a separate Unicode datatype. The [[Ruby (programming language)|Ruby]] language can support multibyte string encoding in later versions or by using extra libraries.

Revision as of 09:59, 17 April 2010

This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

In computer programming languages, a string is a data type which consists of a list of characters arranged together in sequence (like a string of pearls in a necklace).

How strings are stored

In some languages, a string is simply a list of characters with some convenient helper methods that make strings more like blocks of text.

A string seen as an array

As a list, many programming languages let you use array or list processing methods on strings - getting the n-th member of the list will return the n-th character in the string.

One byte, or two?

With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of Unicode, many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the Java programming language (and many languages which run on the Java platform: JRuby, Scala, Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The Python programming language has a separate Unicode datatype. The Ruby language can support multibyte string encoding in later versions or by using extra libraries.

Conversion

Strings can be implicitly or explicitly converted into other datatypes depending on the programming language. Consider the following statement:

print "My favourite number is " + 5

In many languages, the 5 literal will represent an integer. It will be automatically cast into a string '5' and appended to the prior string. Now consider the following:

if (10 == "10") { /* ... */ }

In some languages, this conditional will not be satisfied. The conditional is comparing an integer and a string, and the types do not match. But for many uses, this kind of matching is pedantic and unnecessary. If the string had been converted into an integer, it would be equal to the integer it is being compared with. Similarly, if the integer had been converted into a string, it would be equal to the string it is being compared with.

This kind of conversion is called implicit conversion, and some languages (Scala, for instance) allow one to describe how said implicit conversions happen by declaring implicit type conversion functions.

String manipulation

Some languages were developed for manipulating strings, such as awk and Snobol. String-handling capability will be found in more and more general-purpose programming languages, but especially "scripting" languages such as Perl, PHP, and Python programming language.

Discussed abstractly, there are a number of common string operations, the details of which vary with the language

Operation Parameters Result
Concatenation string1, string2 string1string2
Substring String1, integer1,[ integer2] Integer1 characters of string1, starting at the first character unless integer2 is specified as a starting point
Fields String1, String2 (or character) An array of strings taken from string1, which were separated by string2

References