I wrote a TUI application to help you practice Python regular expressions. There are more than 100 exercises covering both the builtin re and third-party regex module.
If you have pipx, use pipx install regexexercises to install the app. See the repo for source code and other details.
Thanks for sharing this. I took the time to read through the documentation of the
remodule. Here’s my review of the functions.Useful:
re.finditerreturns an iterator over all Match objectsre.searchreturns the first Match object or None if there are no matches.r''use raw strings for patters so you don’t have to worry about backslashes- the optional
flagsargument modifies the behaviour (case insensitive, multiline)
Utility:
re.subreplace each match in the stringre.splitsplit a string by a regular expression
The Match object:
match.groups(0)returns the portion of text matched by the patternmatch.groups(1)returns the first capturing groupmatch.groups(2)returns the second capturing group, and so on
I don’t understand why these exist:
re.matchlike search, but only matches at the beginning of the string. why not just use ‘^’ or ‘\A’ in the pattern you pass to ‘search’?re.fullmatchlike ‘search’, but only if the full string matches. Why not just use ‘\A’ and ‘\Z’ in the pattern you pass to ‘search’?re.findallReturns all matches. It seems like a shitty version of ‘finditer’. The function has three different return types which depend on the pattern you pattern you pass to the function. Who wants to work with that?
I would argue that having distinct
matchandsearchhelps readability. The difference betweenmatch('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s)andsearch('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s)is clear without the need for me to parse the regular expression myself. It also helps code reuse. Consider that you havePHONE_NUMBER_REGEXdefined somewhere. If you only had a method to “search” but not to “match”, you would have to do something likesearch(f"\A{PHONE_NUMBER_REGEX}\Z", s), which is error-prone and less readable. Most likely you would end up having at least two sets of precompiled regex objects (i.e.PHONE_NUMBER_REGEXandPHONE_NUMBER_FULLMATCH_REGEX). It is also a fairly common practice in other languages’ regex libraries (cf. [1,2]). Golang, which is usually very reserved in the number of ways to express the same thing, has 16 different matching methods[3].Regarding
re.findall, I see what you mean, however I don’t agree with your conclusions. I think it is a useful convenience method that improves readability in many cases. I’ve found these usages from my code, and I’m quite happy that this method was available[4]:digits = [digit_map[digit] for digit in re.findall("(?=(one|two|three|four|five|six|seven|eight|nine|[0-9]))", line)] [(minutes, seconds)] = re.findall(r"You have (?:(\d+)m )?(\d+)s left to wait", text)[1] https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html
[2] https://en.cppreference.com/w/cpp/regex
[4] https://github.com/search?q=repo%3Ahades%2Faoc23 findall&type=code
Thank you for the very thorough reply! This is kind of high quality stuff you love to see on Lemmy. Your use cases seem very valid.



