I am attempting to understand what this snippet of code does:
passwd1=re.sub(r'^.*? --', ' -- ', line)
password=passwd1[4:]
I understand that the top line uses regex to remove the " — ", and the bottom line I think removes something as well? I went back to this code after a while and need to improve it but to do that I need to understand this again. I’ve been trying to read regex docs to no avail, what is this: r'^.*?
at the beginning of the regex?.
>Solution :
To break r'^.*? --
into pieces:
r
in front of a string in Python lets the interpreter know that it’s a regex string. This lets you not have to do a bunch of confusing character escaping.- The
^
tells the regex to match only from the beginning of the string. .*?
tells the regex to match any number of characters up to…--
, which is a literal match.
The sum of this is that it will match any string, starting at the beginning of a line up to the --
demarcation. Since it is re.sub()
, the matched part of the string will be replaced with --
.
This is why something like Google -- MyPassword
becomes -- MyPassword
.
The second line is a simple string slice, dropping the first four elements (characters) of the string. This might be superfluous – you could just substitute the match with an empty string like this:
passwd1 = re.sub(r'^.* --', '', line)
This achieves the same result. Note I’ve dropped the ?
, which is also superfluous here, because the *
has a similar but broader effect. There are some technical differences, but I don’t think you need it for your stated purpose.
?
will match zero or one of the previous character – in this case a .
, which is ‘any character’. The *
will match zero or more of the previous character. .*
is what is known as a greedy quantifier, and .*?
a lazy quantifier. That is, the greedy quantifier will match as much as possible, and the lazy will match as little as possible. The difference between ^.*? --
and ^.* --
is what is matched in this case:
Something something -- mypassword -- yourpassword
In the greedy case, the first two clauses (‘something something — mypassword’) are matched and deleted. In the lazy case, only ‘something something’ is deleted. Most passwords don’t include spaces, nevermind ‘ — ‘, so you probably want to use the greedy version.