Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some words with zero-width non-joiner (zwnj) for Farsi #21903

Closed

Conversation

nshayanfar
Copy link

@nshayanfar nshayanfar commented Dec 7, 2024

Context

There are many errors in the file. I will try to fix them in the future slowly. The most important issues are these:

  • In CORRECT Farsi, zwnj (zero-width non-joiner) is used extensively but many people (being lazy) use a space instead of that. Nowadays and especially with copies created by AI, zwnj is spotted frequently. Therefor a lot of these words should be both present with zwnj and space to be correctly identified.
  • Some strings consisted of multiple words without any spaces between them. This is probably a mistake by someone not familiar with the language. It's understandable but wrong.
  • Almost always and unlike English, we don't write vowels in the words (they are simply omitted and inferred). Some of the words contained vowels and still contains vowels. These should be removed. This also sometimes extends to some non-vowels.

Summary

This PR can be summarized in the following changelog entry:

  • Improves keyphrase recognition in Farsi by updating the function words list

Relevant technical choices:

  • Added some adjectives and adverbs
  • Added some auxiliary verbs
  • Added some more popular forms of intensifiers
  • Removed non-written ی in prepositions

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

  • Navigate to sections of the plugin where transition words are used in Farsi.
  • Verify that the new transition words appear correctly in the analysis and are recognized.
  • Verify that there are no regressions in the Farsi language analysis.

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Default Block/Gutenberg/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change. For example, comments in the Relevant technical choices, comments in the code, documentation on Confluence / shared Google Drive / Yoast developer portal, or other.

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins that Yoast SEO provides integrations for.
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.
  • I have checked that the base branch is correctly set.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label.
  • I have added my hours to the WBSO document.

Fixes #

Added some adjectives and adverbs
Added some auxiliary verbs
Added some more populat forms of intensifiers
Removed nonwritten ی in prepositions
@nshayanfar nshayanfar changed the title Added some words with zero-width non-joiner (zwnj) Added some words with zero-width non-joiner (zwnj) for Farsi Dec 9, 2024
@mhkuu
Copy link
Contributor

mhkuu commented Dec 10, 2024

Thanks @nshayanfar for your pull request! Our team will have a look soon and will return to you if they have any questions.

@hannaw93 hannaw93 self-requested a review January 7, 2025 07:51
@hannaw93 hannaw93 self-assigned this Jan 7, 2025
@hannaw93
Copy link
Contributor

Extending PR: #21958

@hannaw93 hannaw93 removed their request for review January 10, 2025 16:42
@hannaw93 hannaw93 changed the title Added some words with zero-width non-joiner (zwnj) for Farsi Add some words with zero-width non-joiner (zwnj) for Farsi Jan 10, 2025
@hannaw93 hannaw93 removed their assignment Jan 10, 2025
@hannaw93 hannaw93 closed this Jan 13, 2025
@hannaw93
Copy link
Contributor

Changes from this PR will be implemented in the corresponding expanding PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants