Exploring the Uncommon Features in EmEditor's Text Manipulation Capabilities

Exploring the Uncommon Features in EmEditor's Text Manipulation Capabilities

Scott Lv8

Exploring the Uncommon Features in EmEditor’s Text Manipulation Capabilities

Viewing 5 posts - 1 through 5 (of 5 total)

  • Author
    Posts
  • November 29, 2007 at 10:36 am #5068
    jugaor
    Participant
    Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):
    a por (eeFindReplaceOnlyWord)
    matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)
    any accented vowel (eeFindReplaceOnlyWord)
    matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)
    In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)
    (?!es |son)esta(s?)(!|?)
    discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ options
    If I simplify the expression
    (?!es) esta(!|?)
    (?!es)esta(!|?)
    or
    (?!son) estas(!|?)
    (?!son)estas(!|?)
    it has the same behavior. However,
    (¡|¿)esta(s?)(?! es| son)
    excepts the correct ones.
    If you need more information, please email-me.
    TIA.
    jugaor
    November 29, 2007 at 7:15 pm #5071
    Yutaka Emura
    Keymaster

    jugaor wrote:
    Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):

    a por (eeFindReplaceOnlyWord)
    matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)

    any accented vowel (eeFindReplaceOnlyWord)
    matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)

    In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)

    (?!es |son)esta(s?)(!|?)
    discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ options

    If I simplify the expression
    (?!es) esta(!|?)
    (?!es)esta(!|?)
    or
    (?!son) estas(!|?)
    (?!son)estas(!|?)

    it has the same behavior. However,
    (¡|¿)esta(s?)(?! es| son)
    excepts the correct ones.

    If you need more information, please email-me.
    TIA.
    jugaor
    As far as your first question is concerned, EmEditor did not try to check unicode characters (character code > U+0080) in previous versions for the speed. However, I will add a routine to check some Latin character (ch >= 0x00c0 && ch <= 0x02b8) in the next beta version. This addition will not cover all the Unicode characters but still improve “whole word” accuracy in most cases while not sacrificing much speed.
    I was not sure about your latter question, but there are two unnecessary spaces in your regular expression: (?!es |son)esta(s?)(!|?)
    One between “s” and “|”, and the other between ‘n’ and ‘)’.
    Removing these spaces does not solve your issue?
    November 30, 2007 at 5:30 am #5074
    jugaor
    Participant
    Hi, thank you very much for your response.
    1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)
    2. The spaces are needed, since they’re two whole words:
    “esta” = “this” / “estas” = “these”, both feminine.
    “es” = “is” (singular, verb to be)
    “son” = “are” (plural, verb to be)
    The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).
    I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.
    Thanks for your attention,
    jugaor
    PS: please, write me when the new beta is ready :)
    November 30, 2007 at 8:30 pm #5077
    Yutaka Emura
    Keymaster
    jugaor wrote:
    Hi, thank you very much for your response.

    1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)

    2. The spaces are needed, since they’re two whole words:
    “esta” = “this” / “estas” = “these”, both feminine.
    “es” = “is” (singular, verb to be)
    “son” = “are” (plural, verb to be)
    The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).

    I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.

    Thanks for your attention,
    jugaor

    PS: please, write me when the new beta is ready :)
    (?=pattern) (positive lookahead search) and (?!pattern) (negative lookahead search) look ahead from the position where search begins.
    For example, expression “(?=x)x” always matches, and expression “(?!x)x” never matches.
    So it doesn’t make sense to place (?=pattern) or (?!pattern) at the beginning of a search term.
    I will release beta 41 today or tomorrow.
    December 1, 2007 at 8:05 am #5080
    jugaor
    Participant
    THANK YOU VERY MUCH! I tried the 41 beta and the ‘special chars’ issue is gone! :D
    Also, I saw that I misunderstood the “look ahead” expression :-?
    I needed to use the “look behind” one (?<!pattern). Excuse me!
    Congratulations for your excellent job!
    jugaor

  • Author
    Posts

Viewing 5 posts - 1 through 5 (of 5 total)

  • You must be logged in to reply to this topic.

Also read:

https://techidaily.com
  • Title: Exploring the Uncommon Features in EmEditor's Text Manipulation Capabilities
  • Author: Scott
  • Created at : 2024-10-15 16:28:58
  • Updated at : 2024-10-17 16:43:24
  • Link: https://win-top.techidaily.com/exploring-the-uncommon-features-in-emeditors-text-manipulation-capabilities/
  • License: This work is licensed under CC BY-NC-SA 4.0.
On this page
Exploring the Uncommon Features in EmEditor's Text Manipulation Capabilities