Python Strings: Immutable Unicode Collections for AI & ML

Learn about Python strings, immutable Unicode collections crucial for data processing in AI and Machine Learning. Explore single, double, and triple quotes.

1.5 Python Strings

A string in Python is an immutable collection of Unicode characters. This means that once a string is created, its characters cannot be changed. Each character within a string is assigned a unique number from the Unicode standard. However, a string itself is treated as text, not a number, even if it contains only digits.

In Python, strings are always enclosed in quotes:

  • Single quotes (')
  • Double quotes (")
  • Triple quotes (''' or """)
# These are all valid string declarations
'Hello World'
"Hello World"
'''Hello World'''
"""Hello World"""

All of the above are stored internally as the same string.

Creating Strings

You can use any of the three quote styles, as long as the starting and ending quotes match.

msg1 = 'Python is fun!'
msg2 = "Python is fun!"
msg3 = '''Python is fun!'''
msg4 = """Python is fun!"""

print(msg1, msg2, msg3, msg4)

Accessing Characters in a String

Python does not have a separate character type; a single character is simply a string of length one. You can access characters using indexing and slicing:

text = "Programming"

# Accessing a single character by index (0-based)
print(text[0])     # Output: 'P'

# Slicing a portion of the string
print(text[3:7])   # Output: 'gram' (from index 3 up to, but not including, index 7)

Updating Strings

Since strings are immutable, you cannot modify them directly. Instead, you create a new string based on an existing one.

greeting = "Good Evening"
new_greeting = greeting[:5] + "Morning"  # Take the first 5 chars and append "Morning"
print(new_greeting)   # Output: 'Good Morning'

Escape Sequences

Escape characters start with a backslash (\) and are used to represent special characters that cannot be directly typed or have a special meaning within a string.

Escape CodeDescription
\nNewline
\tHorizontal tab
\\Backslash
\'Single quote
\"Double quote
\rCarriage return
\bBackspace
print("Line1\nLine2")
# Output:
# Line1
# Line2

print("He said: \"Python is great!\"")
# Output: He said: "Python is great!"

String Operators

You can perform various operations on strings using operators:

Assume:

a = "Hello"
b = "World"
OperatorDescriptionExampleResult
+Concatenationa + b'HelloWorld'
*Repetitiona * 2'HelloHello'
[]Indexinga[1]'e'
[:]Slicinga[1:4]'ell'
inMembership (exists)'H' in aTrue
not inMembership (not exists)'Z' not in aTrue
r"..."Raw string (ignores escape sequences)r"\n"'\\n'

String Formatting with the % Operator

The % operator allows for C-style string formatting, inserting values into a string template.

name = "Alice"
age = 30
print("Name: %s, Age: %d" % (name, age))
# Output: Name: Alice, Age: 30

Common Format Symbols:

SymbolMeaning
%sString
%dInteger
%fFloating point
%xHexadecimal (lowercase)
%XHexadecimal (uppercase)
%oOctal
%eScientific notation (e)
%gCompact format (float/e)

Embedding Quotes

To include quotes within a string, use a different type of quote for the string itself, or escape the inner quotes.

msg1 = 'He said "Python is easy!"'
msg2 = "It's a beautiful day."
print(msg1)
print(msg2)

Triple Quotes for Multiline Strings

Triple quotes allow strings to span multiple lines directly, preserving the line breaks.

paragraph = """This is a
multi-line
string example."""
print(paragraph)
# Output:
# This is a
# multi-line
# string example.

Arithmetic Operations with Strings

Strings are non-numeric data types. Therefore, arithmetic operations like subtraction (-), multiplication (* with strings as both operands), and division (/) between two strings are not allowed.

# print("Hello" - "World")  # Invalid operation
# Output: TypeError: unsupported operand type(s) for -: 'str' and 'str'

You can only use + (concatenation) and * (repetition with a number) with strings.

Identifying String Type

In Python, every string is an object of the str class. You can verify this using the type() function.

message = "Learning Python is fun!"
print(type(message))
# Output: <class 'str'>

Python Built-in String Methods

Python provides a rich set of built-in methods to manipulate strings:

  • capitalize(): Converts the first character to uppercase.
  • casefold(): Converts all characters to lowercase (more aggressive than lower()).
  • center(width, fillchar): Centers the string with optional padding.
  • count(sub, start, end): Counts occurrences of a substring.
  • encode(encoding): Encodes the string to a bytes representation.
  • endswith(suffix): Checks if the string ends with the specified suffix.
  • expandtabs(tabsize): Converts tabs to spaces.
  • find(sub): Returns the lowest index of a substring; returns -1 if not found.
  • format(): Formats the string using placeholders.
  • index(sub): Same as find(), but raises a ValueError if the substring is not found.
  • isalnum(): Returns True if all characters are alphanumeric.
  • isalpha(): Returns True if all characters are alphabetic.
  • isascii(): Returns True if all characters are ASCII.
  • isdecimal(): Returns True if all characters are decimal characters.
  • isdigit(): Returns True if all characters are digits.
  • isidentifier(): Checks if the string is a valid Python identifier.
  • islower(): Checks if all cased characters are lowercase.
  • isnumeric(): Checks if the string is numeric.
  • isprintable(): Checks if all characters are printable.
  • isspace(): Returns True if the string contains only whitespace.
  • istitle(): Checks if the string is in title case.
  • isupper(): Checks if all cased characters are uppercase.
  • join(iterable): Concatenates items from an iterable with the string as a separator.
  • ljust(width, fillchar): Left-justifies the string.
  • lower(): Converts all characters to lowercase.
  • lstrip(): Removes leading whitespace.
  • maketrans(): Returns a translation table for translate().
  • partition(sep): Splits the string into three parts at the first occurrence of the separator.
  • removeprefix(prefix): Removes the specified prefix if present.
  • removesuffix(suffix): Removes the specified suffix if present.
  • replace(old, new): Replaces occurrences of a substring.
  • rfind(sub): Returns the highest index of a substring; returns -1 if not found.
  • rindex(sub): Same as rfind(), but raises a ValueError if the substring is not found.
  • rjust(width, fillchar): Right-justifies the string.
  • rpartition(sep): Splits the string into three parts at the last occurrence of the separator.
  • rsplit(sep, maxsplit): Splits the string from the end.
  • rstrip(): Removes trailing whitespace.
  • split(sep, maxsplit): Splits the string into a list of substrings.
  • splitlines(): Splits the string at line breaks.
  • startswith(prefix): Checks if the string starts with the specified prefix.
  • strip(): Removes leading and trailing whitespace.
  • swapcase(): Swaps the case of all letters.
  • title(): Converts the string to title case.
  • translate(table): Applies a translation table to the string.
  • upper(): Converts all characters to uppercase.
  • zfill(width): Pads the string on the left with zeros to a specified width.

Built-in Functions for Strings

Some general built-in functions are particularly useful when working with strings:

  • len(string): Returns the number of characters in the string.
  • max(string): Returns the character with the highest Unicode value (alphabetically last).
  • min(string): Returns the character with the lowest Unicode value (alphabetically first).
name = "Python3"
print(len(name))     # Output: 7
print(max(name))     # Output: y
print(min(name))     # Output: 3