1976s

이 블로그는 웹 개발, 프로그래밍, IT 활용법을 다루며, 실용적인 팁과 정보를 제공합니다.

# Side Menu

recentPost
popularPost
Archive
recentComment

공부/기타

파이썬 문자열 자료형

2023. 3. 17., 1976s

파이썬에서 문자열(String)은 하나 이상의 문자를 나열한 것으로, 따옴표(' 또는 ")로 둘러싸인 문자의 시퀀스 타입입니다. 문자열은 불변(immutable) 자료형이며, 일단 생성되면 변경할 수 없습니다.

"Life is too short, You need Python"
'b'
"12334"

위의 "12334"는 우리가 보기에는 숫자이지만 따옴표(' 또는 ")로 둘러싸여 있으면 문자열입니다.

a = "Pithon"
a[1] = "y" # Error -> 'str' object does not support item assignment

'str' object does not support item assignment -> 'str' 개체가 항목 할당을 지원하지 않습니다.

문자열(string): 문자의 나열로, 수정이 불가능한(immutable) 시퀀스 타입입니다.

문자열 만들기

파이썬의 문자열은 4가지입니다.

1. 큰 따옴표로 양쪽 둘려싸기

"Hello World"

2. 작은따옴표로 양쪽 둘러싸기

'Hello World'

3. 큰 따옴표 3개 연속(""")으로 양쪽 둘러싸기

"""Life is too short, You need Python"""

4. 작은따옴표 3개 연속(''')으로 양쪽 둘러싸기

'''Life is too short, You need Python'''

왜 4가지인가?

문자열 안에 작은따옴표나 큰 따옴표를 넣어야 할 때를 위해 다음표(' 또은 ")가 있습니다. 또한 연속 3개 따옴표는 문장이 여러 줄일 때 사용합니다.

a = "python's favorite food is perl"
b = '"Python is very easy." he says'

c = '''
    Life is too short
    You need python
    '''
d = """
    Life is too short
    You need python
    """

물론 '\n'을 사용해도 됩니다.

c = "Life is too short\nYou need python"

이스케이프 코드(Escape code)

이스케이프 코드(Escape code)는 특수한 문자를 출력하기 위해 사용하는 문자열 내부의 특별한 문자 조합입니다. 사용법은 역슬래시(\)로 조합하여 사용하면 됩니다.

코드	설명
\n	줄바꿈(Newline)
\t	탭(Tab)
\'	작은따옴표(Single quote)
\"	큰따옴표(Double quote)
\\	역슬래시(Backslash)

print("Hello\nWorld")  # Hello
                        # World

print("Hello\tWorld")  # Hello   World

print('She said, "I\'m fine."')  # She said, "I'm fine."

print("C:\\Users\\username\\Documents")  # C:\Users\username\Documents

문자열 연산하기

문자열 결합(concatenation)

head = "Python"
tail = " is fun"
print(head + tail + "!") # Python is fun!

문자열 반복

s = "spam " * 3
print(s)  # spam spam spam

응용

a = "My Program"
print("=" * 50)
print(a)
print("=" * 50)

>>> ==================================================
>>> My Program
>>> ==================================================

문자열 인덱싱(indexing)과 슬라이싱(slicing)

인덱싱(indexing)이란 무언가를 가리킨다는 뜻으로 연속적인 객체(리스트, 튜플, 문자열, 바이트, 바이트 배열)에 부여된 번호를 의미합니다. 쉽게 말해서 원하는 값을 가리킬 때 indexing을 사용합니다.

슬라이싱(slicing)이란 무언가를 잘란 낸다는 뜻으로 연속적인 객체(리스트, 튜플, 문자열)에 부여된 번호를 이용해 연속된 객체에 일부를 추출하는 작업입니다.

인덱싱

인덱싱은 대괄호([]) 안에 원하는 위치의 인덱스(index)를 지정하여 해당 위치의 값을 선택합니다.

s8 = "Python"
print(s8[0])  # P
print(s8[2])  # t
print(s8[-1])  # n

인덱싱의 시작은 [0]부터입니다. 그리고 뒤에서부터 읽기는 [-1]부터입니다. [-0]은 [0]과 같습니다.

슬라이싱

a = "Hello, World!"
print(a[:])  # Hello, World!
print(a[0:5])  # Hello
print(a[7:])  # World!
print(a[:5])  # Hello
print(a[6:-2]) # ' Worl'

a [시작번호:끝번호:간격]

간격을 생략하면 기본으로 1이 됩니다.

a [0:5]는 a [0] = H, a [1] = e,..., a [4] = o로 0 <= a < 5 같은 의미입니다. 즉 첫 번째 부분이 0으로 시작하니 5는 5번째 즉 a [4]가 됩니다.

a [7:]는 8번째 글자부터 즉 a [7]부터 마지막까지입니다.

a [:5]는 첫 번째부터 5번째 글자 a [4]까지입니다

a [6:-2]는 6번째 글자(공백)부터 뒤에서 3번째 a [-3]까지 a[-2]는 포함하지 않습니다.

응용

a = "Pithon"
b = a[:1] + "y" + a[2:]
print(b) # Python
c = a[0] + "y" + a[2:]
print(c) # Python

d = [1, 2, 3, ['a', 'b', 'c'], 4, 5]
print(d[2:5]) # [3, ['a', 'b', 'c'], 4]
print(d[3][:2]) # ['a', 'b']

문자열 포매팅

파이썬에서 문자열 포매팅(string formatting)은 문자열 안에 변수 값을 삽입하는 방법입니다.

'`%`' 연산자를 이용한 포맷팅

print("I eat %d apples" %3) # I eat 3 apples
print("I eat %s apples" % "five") # I eat five apples
number = 3
print("I eat %d apples" % number) # I eat 3 apples
day = "three"
print("I ate %d apples. so I was sick for %s days." % (number, day))
# I ate 3 apples. so I was sick for three days.

문자열 포맷 코드

코드	설명
%d	10진 정수
%o	8진 정수
%x, %X	16진 정수 (소문자, 대문자)
%f	부동 소수점 실수
%e, %E	지수 형태의 실수 (소문자, 대문자)
%g, %G	실수를 지수 형태로 표현하거나, 일반적인 실수로 표현하는 것 중 짧은 것을 선택 (소문자, 대문자)
%c	문자
%s	문자열
%r	객체를 문자열로 출력

재미있는 것은 '%s' 포맷 코드는 어떤 형태의 값이든 변환해 넣을 수 있습니다.

'`str.format()`' 메서드를 사용한 포매팅

name = 'Alice'
age = 25

print('My name is {} and I am {} years old.'.format(name, age))
# My name is Alice and I am 25 years old.

{}에 대응하는 것을 format()의 인자로 전달할 수 있습니다.

'`f-문자열`'을 사용한 포매팅

f-문자열을 사용하여 문자열 내에 변수나 표현식을 간편하게 포맷팅 하는 방법입니다.

name = 'Alice'
age = 25

print(f'My name is {name} and I am {age} years old.')
# My name is Alice and I am 25 years old.

문자열 앞에 f를 입력하면 f-string이 적용됩니다. {}에 변수를 직접 입력하기 때문에 가독성이 좋습니다.

글자 수를 지정하여 문자열을 정렬과 공백

왼쪽 정렬	가운데 정렬	오른쪽정렬
{문자:<10s}	{정수:^10d}	{실수:>10f}

string = "Hi"
print('\'%10s\'' % string)
print('\'{:>10}\''.format(string))
print(f'\'{string:>10}\'')

결과

'        Hi' <- Hi가 오른쪽으로 정렬됨
'        Hi'
'        Hi'

위 소스에서 보듯이 전체 길이가 10이고 오른쪽 정렬해서 보입니다.

string = "Hi"
print('\'%-10s\'' % string)
print('\'{:<10}\''.format(string))
print(f'\'{string:<10}\'')

결과

'Hi        ' <- Hi가 왼쪽으로 정렬됨
'Hi        '
'Hi        '

위 소스에서 보듯이 전체 길이가 10이고 왼쪽 정렬해서 보입니다.

string = "Hi"
width = 10
fill_char = "-"
print('\'{:^10}\''.format(string))
print(f'\'{string:^10}\'')
print(f'\'{string:=^10}\'')
print('\'{:^{}}\''.format(string, width))
print(f'\'{string:^{width}}\'')
print('\'{:{}^{}}\''.format(string, fill_char, width))
print(f'\'{string:{fill_char}^{width}}\'')

결과

'    Hi    '
'    Hi    '
'====Hi===='
'    Hi    '
'    Hi    '
'----Hi----'
'----Hi----'

% 를 이용 중앙 전열 만들기 `str.center()`

이 예에서 width는 출력 문자열의 전체 너비를 지정합니다. 형식 문자열 내의 * 문자는 파이썬에게 width 값을 필드 너비로 사용하도록 지시합니다. %s 자리 표시자는 문자열 값을 삽입하는 데 사용됩니다.

string = "hello"
width = 10
print('\'%*s\'' % (width, string))
# 기본적으로 문자열은 필드 내에서 왼쪽 정렬됩니다. 
# 문자열을 가운데 정렬하려면 ^ 문자를 대신 사용할 수 있습니다.
print('\'%*s\'' % (width, string.center(width)))
print('\'%*s\'' % (width, string.center(width, '-')))

str.center(width[, fillchar])는 지정된 width 매개변수 내에서 가운데 정렬된 문자열을 반환하는 내장 메서드입니다. 원래 문자열을 fillchar(기본값은 공백 문자)로 채워 width 값에 도달하고 결과 문자열을 반환합니다.

결과

'     hello'
'  hello   '
'--hello---'

소수점 표현

number = 3.14159
string = "The value of pi is approximately %.2f" % number
print(string)

The value of pi is approximately 3.14

이 예에서 형식 지정자 '%.2f'는 부동 소수점 숫자가 소수점 두 자리로 형식화되어야 함을 지정하는 데 사용.

`str.format`를 이용 바로 대입

print('I eat {0} apples'.format(3))
print('I eat {0} apples and {1} oranges'.format(3, 2))
print('I eat {1} apples and {0} oranges'.format(2, 'five'))

중괄호 안의 '0'은 전달되는 값이 문자열의 첫 번째 자리 표시자에 삽입되어야 함을 나타냅니다.

결과

I eat 3 apples
I eat 3 apples and 2 oranges
I eat five apples and 2 oranges

`str.format`를 이용 이름으로 대입(named placeholder)

name = "Alice"
age = 30
occupation = "engineer"
string = "My name is {name}, I'm {age} years old, and I work as an {occupation}."
formatted_string = string.format(name=name, age=age, occupation=occupation)
print(formatted_string)

결과

My name is Alice, I'm 30 years old, and I work as an engineer.

`f-문자열`을 이용하여 대입하기

name = "Alice"
age = 30
occupation = "engineer"
string = f"My name is {name}, I'm {age} years old, and I work as an {occupation}."
print(string)

결과

My name is Alice, I'm 30 years old, and I work as an engineer.

f문자열은 표현식(변수와 +,- 같은 수식을 함께 사용하는 것.)을 지원합니다.

age = 30
string = f"I will be {age+1} years old next year."
print(string)

결과

I will be 31 years old next year.

f문자열은 딕셔너리(key와 value라는 것을 한쌍으로 갖는 자료형)를 지원합니다.

a = { 'name':'Alice', 'age': 30, 'occupation':'engineer'}
string = f"My name is {a['name']}, I'm {a['age']} years old, and I work as an {a['occupation']}."
print(string)

결과

My name is Alice, I'm 30 years old, and I work as an engineer.

f문자열 '{', '}' 문자표시

print(f'{{30}}') # {30}

문자열 관련 함수

len()

'len()' 문자열의 길이를 반환합니다.

a = "Life is too short"
print(len(a)) # 출력결과: 17

count()

'count()' 메소드는 문자열에서 특정 문자열이 나타나는 횟수를 반환합니다.

string = "Hello, World!"
a = string.count('l')
print(a)  # 출력결과: 3

또한, str.count() 메소드는 두 개의 인자를 받을 수 있습니다. 첫 번째 인자는 검색할 문자열이며, 두 번째 인자는 검색을 시작할 인덱스입니다. 두 번째 인자를 생략하면 문자열의 처음부터 검색을 시작합니다.

string = "Hello, World!"
count = string.count("l", 4) # 4번째 인덱스부터 검색을 시작합니다.
print(count)  # 출력결과: 2

string.count("l", 4)는 문자열 string에서 4번째 인덱스부터 문자 "l"이 몇 번 나타나는지 세어서 그 수를 반환합니다.

string = "Hello, World!"
lower_case = string.lower()
print(lower_case)  # 출력결과: "hello, world!"

upper()

'upper()' 문자열을 모두 대문자로 변환합니다.

string = "Hello, World!"
upper_case = string.upper()
print(upper_case)  # 출력결과: "HELLO, WORLD!"

lower()

'lower()' 문자열을 모두 소문자로 변환합니다.

string = "Hello, World!"
lower_case = string.lower()
print(lower_case)  # 출력결과: "hello, world!"

capitalize()

'capitalize()' 문자열의 첫 글자를 대문자로 변환합니다.

string = "hello, world!"
capitalized = string.capitalize()
print(capitalized)  # 출력결과: "Hello, world!"

replace()

'replace()' 문자열에서 특정 문자열을 다른 문자열로 대체합니다.

string = "Hello, World!"
new_string = string.replace("World", "Python")
print(new_string)  # 출력결과: "Hello, Python!"

split()

'split()' 문자열을 분리하여 리스트로 반환합니다.

string = "Hello, World!"
split_string = string.split(",")
print(split_string)  # 출력결과: ["Hello", " World!"]

join()

'join()' 리스트의 문자열 요소들을 하나의 문자열로 결합합니다.

list = ["Hello", "World", "!"]
joined_string = " ".join(list)
print(joined_string)  # 출력결과: "Hello World !"

startswith()

'startswith()' 문자열이 특정 문자열로 시작하는지 여부를 반환합니다.

string = "Hello, World!"
starts_with_hello = string.startswith("Hello")
print(starts_with_hello)  # 출력결과: True

endswith()

'endswith()' 문자열이 특정 문자열로 끝나는지 여부를 반환합니다.

string = "Hello, World!"
ends_with_world = string.endswith("World!")
print(ends_with_world)  # 출력결과: True

마무리

나름 플로그 글로 적어가면서 공부도 겸하였습니다. 문자열을 많이 사용을 하니 꼼꼼하게 정리하려고 하였지만 아직도 부족하네요. 나름 공부하면서 필요하다 싶은 것을 넣다 보니 길여졌네요.