I’m extracting from bibtex and have a little problem, as the format can have values wrapped inside curly brackets OR NOT.
Please find the example text below:
@article{Roxas_2011, title={Social Desirability Bias in Survey Research on Sustainable Development in Small Firms: an Exploratory Analysis of Survey Mode Effect}, volume={21}, ISSN={1099-0836}, url={http://dx.doi.org/10.1002/bse.730}, DOI={10.1002/bse.730}, number={4}, journal={Business Strategy and the Environment}, publisher={Wiley}, author={Roxas, Banjo and Lindsay, Val}, year={2011}, month=sep, pages={223\xe2\x80\x93235} }
A you can see, all except month are x={y}, so a simple (PHP preg_match with mUg flags):
[\s,]+(.*)={(.*[^}])}
Does the trick for everything except month=sep.
If I try using ", " as delimited, it aparantly also splits authors.
Can you please help me? 🙂
Thanks 🙂
>Solution :
You can use
[\s,]+(.*?)=(?|{([^{}]*)}|(\w+))
Note you should not use any flags with the regex (you may use an s flag to make . match line break chars and you may use u flag to make \w and \s match all Unicode word/whitespace chars – if you need).
See the regex demo.
Details
[\s,]+– one or more whitespaces or/and commas(.*?)– Group 1: any zero or more chars other than line break chars as few as possible=– a=char(?|{([^{}]*)}|(\w+))– a branch reset group matching:{([^{}]*)}– a{char, any zero or more chars other than{and}captured into Group 2, a}char.|– or(\w+)– Group 2: one or more word chars.