Regex; Getting Dates
I’m recently reading up on Regular Expression and was having some issue getting the data I need. I have a html text file that has the following as its content:
- <tbody>
- <tr><td class="n"><a href="../">Parent Directory</a>/</td><td class="m"></td><td class="s">- </td><td class="t">Directory</td></tr>
- <tr><td class="n"><a href="20091103/">20091103</a>/</td><td class="m">2009-Dec-29 19:31:07</td><td class="s">- </td><td class="t">Directory</td></tr>
- <tr><td class="n"><a href="20100116/">20100116</a>/</td><td class="m">2010-Jan-30 01:54:24</td><td class="s">- </td><td class="t">Directory</td></tr>
- <tr><td class="n"><a href="20100130/">20100130</a>/</td><td class="m">2010-Mar-26 05:31:56</td><td class="s">- </td><td class="t">Directory</td></tr>
- <tr><td class="n"><a href="20100730/">20100730</a>/</td><td class="m">2010-Aug-08 15:59:47</td><td class="s">- </td><td class="t">Directory</td></tr>
- <tr><td class="n"><a href="latest/">latest</a>/</td><td class="m">2010-Aug-05 03:46:25</td><td class="s">- </td><td class="t">Directory</td></tr>
- </tbody>
The only part of this html file I need is the YYYYMMDD section between between <a href=”…”></a>. I tried a few ways of stripping out the desire date as using Powershell’s -match but I found that -match has its own shortcoming and was frustrating the hell out of me. I went to a co-worker of mine who excel in regular expression and he threw my regex syntax into Regulator and it would give me the result I want, so I know my criteria is correct but Powershell doesn’t like it or not showing me what I want.
Thus, I ended up using the .Net regex class and was able to get my desire result:
- $c = gc ".\Desktop\powershell\local_site.txt"
Without the foreach loop, you will get a listing that look like this:
- Groups : {20091103, 20091103}
- Success : True
- Captures : {20091103}
- Index : 1301
- Length : 32
- Value : 20091103
- ...
- ...



