Handling String in Python
Parag Verma
Introduction
In this blog we will look at some of the most important functions to analyse a string. We will look at the following functionality
- Dividing string into chunks
- Repalcing a given sequence with another
- Removing leading and trailing spaces
- Finding a string pattern
Creating a string
Lets create an string and check its length
str1 = "Sample string"
print("Text in str1 is '",str1,"' and the length ",len(str1))
Text in str1 is ' Sample string ' and the length 13
Formatting as string using print option
weight = 75
name = 'Parag'
print(" My Name is : %s \t weight is :%i" %(name,weight))
My Name is : Parag weight is :75
We can see that formatting is possible by using print function. The following options can be used along with print to enhance representation of output
- %d : Integer
- %s : string
- %f : float
- %i : integer
Concatenation Operation using “+”
str1 = "Sample string"
str2 = " Second string"
print(str1+str2)
Sample string Second string
Repetition using asterisk
str1 = "Sample string"
str2 = " Second string"
print(str1*2 + str2*3)
Sample stringSample string Second string Second string Second string
Using “in” & “not in” membership
str1 = "Sample string"
flag1='a' in str1
print(flag1)
True
flag2='x' not in str1
print(flag2)
True
Using double quotes and backslas in the strings
If there is a double quotes within the string itself, then we can use backslash
print(" Use backslash when string needs to have quotes \"Special\"")
Use backslash when string needs to have quotes "Special"
Using “for loop” to iterate through the strings
str1 = "Sample string"
for i in str1:
print(i,end =" ")
S a m p l e s t r i n g
String Functions
We will look at the below string maniupulation functions.These are most widely used in day to day programming tasks
- lower
- upper
- capitalize
- join
- rstrip
- split
- rsplit
- lstrip
- rstrip
- find
- replace
lower
x = "this IS A strINg "
print(x.lower())
this is a string
upper
x = "this IS A strINg "
print(x.upper())
THIS IS A STRING
join method to concatenate two strings
a = "Python join method"
b = "is a powerful method"
print(';'.join([a,b]))
Python join method;is a powerful method
a = "Python join method"
b = "is a powerful method"
print(','.join([a,b]))
Python join method,is a powerful method
using list with join function
words = ['Python','is','suitable','for','ML']
print(' '.join(words))
Python is suitable for ML
‘strip’ commas from a string
There is an added space to the last element of words list. We can use rstrip function to remove it
str_t = ""
words = ['Python','is','suitable','for','ML ']
for i in words:
str_t =str_t+i+" "
print(str_t)
Python is suitable for ML
print(str_t.rstrip(' '))
Python is suitable for ML
Executing the similar example on tuples
words_t = ('Python','is','suitable','for','ML ')
print(', '.join(words_t))
Python, is, suitable, for, ML
ls=', '.join(words_t)
print(ls.rstrip(' '))
Python, is, suitable, for, ML
split and rsplit
Output of a split is a list
a = "Python join method with split and rsplit"
print(a.split(' '))
['Python', 'join', 'method', 'with', 'split', 'and', 'rsplit']
We can Restrict number of splits
a = "Python join method with split and rsplit"
print(a.split(' ',2))
['Python', 'join', 'method with split and rsplit']
As seen above, passing a parameter of 2 generates a list with 3 elements: starting index position 1 and going to index position 2
split on multiple delimiters
Here we will use split function from ‘re’ library. In the function, each delimiter will be separated using a pipe operator
c = " This is ; a string: with * multiple /n delimiters"
# Importing the re library
import re
print(re.split(';|:|\*|/n|/',c))
[' This is ', ' a string', ' with ', ' multiple ', ' delimiters']
rsplit : scans the string from the end
r in rsplit starnds for the rear end of the string.Hence it scans the string from the end
a = "Python join method with split and rsplit"
print(a.rsplit(' '))
['Python', 'join', 'method', 'with', 'split', 'and', 'rsplit']
You would imagine that there is no difference between split and rsplit.Lets lok at the below example to understand the difference
a = "Python join method with split and rsplit"
print(a.rsplit(' ',2))
['Python join method with split', 'and', 'rsplit']
We can see that difference is clear if we specify the number of splits parameter
strip, lstrip and rstrip
Used during text mining activities and data cleaning. Examples include removing leading and trailing spaces or any other pattern.
- strip-removes leading and trailing blanks
- lstrip-removes leading blanks
- rtrip-removes trailing blanks
a = " Python join method with split and rsplit "
print(a.strip())
Python join method with split and rsplit
a = " Python join method with split and rsplit "
print(a.lstrip())
Python join method with split and rsplit
a = " Python join method with split and rsplit "
print(a.rstrip())
Python join method with split and rsplit
find and rfind function
- Returns index if string is found
- return -1 if string is not found
- Returns index of the first occurence
- returns the index of last occurence
a = "Python join method with split and with rsplit"
print(a.find("join"))
7
We got 7 as the result.join starts at index position 7
a = "Python join method with split and with rsplit"
print(a.find("joi"))
7
Still got 7 as it gives the index position of the starting word
Lets retrieve the string from the index where string is found
a = "Python join method with split and with rsplit"
print(a[a.find("join"):])
join method with split and with rsplit
Lets try and create a function to find a string in the text
def find_str(source_string,search_string):
if(source_string.find(search_string)) != -1:
print("string found at index: %d \t String is :%s" %(source_string.find(search_string),source_string[source_string.find(search_string):]))
else:
print("String not found")
a = "Python join method with split and with rsplit"
find_str(a,"method")
string found at index: 12 String is :method with split and with rsplit
counts the number of occurrences of a substring
a = "Python join method with split and with rsplit"
print(a.count('with',1,30))
1
We got 1 as the second occurence of with is beyond the index position 30
a = "Python join method with split and with rsplit"
print(a.count('with',1,50))
2
Once we increased the index range, we were able to capture the other ‘with’ as well
Repalce
a = "Python join method with split and with rsplit"
print(a.replace(" ",""))
Pythonjoinmethodwithsplitandwithrsplit
Final Comments
In this blog we saw how to play around with text string, how to use different assocaited functions.This will enhance our understanding about how to manupulate string, convert it into list, use join function and several other functions from re(regular expression) library