Regex to find "a" where "an" should have been used
Thread poster: John Fossey
John Fossey
John Fossey  Identity Verified
Canada
Local time: 15:27
Member (2008)
French to English
+ ...
Aug 31, 2023

I'm looking for a regex command that will find instances where the word "a" has been mistakenly used where "an" should have been used. This happens when the word "a" is followed by a word that starts with a vowel, and sometimes with an "h" or with an acronym that starts with a letter that is pronounced like a vowel, such as an F, H, L, M, N, R, S, X.

This would really help with proofreading a long technical document.


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 20:27
Danish to English
+ ...
Not all vowels Aug 31, 2023

And not words beginning with a vowel if the beginning is not pronounced like a vowel (Europe, uranium, US …). Complicated.

Have you thought of letting Bard or ChatGPT identify such occurrences? They might be good at this type of task if you define it well.

[Edited at 2023-08-31 14:50 GMT]


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 20:27
Member (2014)
Japanese to English
What flavour? Aug 31, 2023

John Fossey wrote:
I'm looking for a regex command that will find instances where the word "a" has been mistakenly used where "an" should have been used. This happens when the word "a" is followed by a word that starts with a vowel, and sometimes with an "h" or with an acronym that starts with a letter that is pronounced like a vowel, such as an F, H, L, M, N, R, S, X.

In what application will you be using this? Regex flavours vary considerably. As a rough first pass, something like:

a [a|e|i|o|u]

...should capture "a" followed a vowel. The acronym issue would need a bit more thought.

Dan


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 15:27
Member (2008)
French to English
+ ...
TOPIC STARTER
ChatGPT having a bad day Aug 31, 2023

Thomas T. Frost wrote:

Have you thought of letting Bard or ChatGPT identify such occurrences? They might be good at this type of task if you define it well.


ChatGPT seems to be having a bad day. When I asked it (is that the right pronoun?) it just responds "something went wrong"...

[Edited at 2023-08-31 14:48 GMT]


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 15:27
Member (2008)
French to English
+ ...
TOPIC STARTER
MemoQ Aug 31, 2023

Dan Lucas wrote:

In what application will you be using this? Regex flavours vary considerably. As a rough first pass, something like:

a [a|e|i|o|u]

...should capture "a" followed a vowel. The acronym issue would need a bit more thought.

Dan


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 20:27
Danish to English
+ ...
AI Aug 31, 2023

ChatGPT is having a bad day indeed. I tried Bard, but it gets it wrong. I don't think regex is going to be of much use because both vowels and consonants can require 'a' or 'an' depending on pronunciation.

Consider testing things like the following when ChatGPT wakes up:

'A European card. A uranium mine. A US citizen. An MS Office application. An evil person. A Microsoft application.'


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 20:27
Member (2014)
Japanese to English
Dot NET then Aug 31, 2023

MemoQ is .NET, so should be fine.

This for catching "a" followed by a vowel.

a [a|e|i|o|u]

And this for "a" followed by an acronym containing at least two capital letters

a (?:[A-Z][a-z]*){2,}

My testing has been minimal! These won't catch all cases, but they should get you fairly close. It's a question of the trade-off between time invested in creating the regex and the time saved by using the regex. The perfect is the enemy of t
... See more
MemoQ is .NET, so should be fine.

This for catching "a" followed by a vowel.

a [a|e|i|o|u]

And this for "a" followed by an acronym containing at least two capital letters

a (?:[A-Z][a-z]*){2,}

My testing has been minimal! These won't catch all cases, but they should get you fairly close. It's a question of the trade-off between time invested in creating the regex and the time saved by using the regex. The perfect is the enemy of the good.

Dan
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Regex to find "a" where "an" should have been used






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »