n_letter_words and a personal (publicly available) package

How I created a handy function and a personal package

R
package development
Author

Ella Kaye

Published

June 17, 2017

There’s a little R function that I wrote and packaged up to generate a vector or data frame of words of a given length. I find it useful in a wide variety of contexts and thought other might too. To kick off my new blog, here’s a post about it.

The function, n_letter_words, came about because I wanted to be able to generate row and column names for a large matrix - didn’t matter what they were, as long as they were unique. Since I was in the habit of using the built-in LETTERS vector to do this for small matrices, I naturally thought of using combinations of letters to do this in a larger case. In figuring out how to do this, as is so often the case, it was stackoverflow to the rescue. There, I learnt about expand.grid and could then use some tidyverse tools to get the vector I was after:

Toggle the code
library(dplyr)
library(tidyr)
out <- expand.grid(LETTERS, LETTERS) %>%
  as_tibble() %>%
  unite(word, 1:2, sep = "") %>%
  pull()
c(head(out), tail(out))
 [1] "AA" "BA" "CA" "DA" "EA" "FA" "UZ" "VZ" "WZ" "XZ" "YZ" "ZZ"

Sorted! At least I thought so, until, a couple of months later, when I wanted to generate names for a 1000*1000 matrix, and realised both that I’d forgotten the expand.grid trick, and once I’d re-found the stackoverflow post, that it didn’t give me enough words. That was enough to make it worth writing a function, taking n as an argument, that gives all ‘words’ of length \(n\).

Writing functions always makes me think of what other arguments might be useful. What if we want something between the 676 two-letter words and 17,576 three-letter words (or the 456,976 four-letter words, etc)? Hence the argument num_letters, which can be set between 1 and 26, and results in a total of num_letters\(^n\) words. By default, the function returns a tibble, but setting as_vector = TRUE does what you’d expect. And I threw in a case argument too.

Now that I had my function, what to do with it? I remembered articles I’d read about the usefulness of making and sharing a personal package. Now seemed like the time to do that myself.

So, here is my personal package, EMK. If you think that n_letter_words might be of use to you, then feel free to install!

Toggle the code
devtools::install_github("EllaKaye/EMK")

Some examples of n_letter_words:

Toggle the code
library(EMK)

n_letter_words(2)
# A tibble: 676 × 1
   word 
   <chr>
 1 AA   
 2 BA   
 3 CA   
 4 DA   
 5 EA   
 6 FA   
 7 GA   
 8 HA   
 9 IA   
10 JA   
# … with 666 more rows
Toggle the code
some_three_letter_words <- n_letter_words(
  n = 3, 
  num_letters = 10, 
  case = "lower", 
  as_vector = TRUE
)

c(head(some_three_letter_words), tail(some_three_letter_words))
 [1] "aaa" "baa" "caa" "daa" "eaa" "faa" "ejj" "fjj" "gjj" "hjj" "ijj" "jjj"
Toggle the code
length(some_three_letter_words)
[1] 1000

For now, my personal package has only this one function, but watch this space! No doubt I’ll be adding more that I find useful. Perhaps, you’ll find them useful too.

Incidentally, none of the above would have happened if I’d just thought, for my test matrix A, to set dimnames(A) <- list(1:nrow(A), 1:ncol(A))!

Last updated

2022-11-11 21:27:19 GMT

Details

Session info

Toggle
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23)
 os       macOS Monterey 12.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/London
 date     2022-11-11
 pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
 quarto   1.2.247 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package     * version date (UTC) lib source
 P dplyr       * 1.0.10  2022-09-01 [?] CRAN (R 4.2.0)
 P EMK         * 0.1.0   2022-08-16 [?] Github (EllaKaye/EMK@a28e89e)
 P sessioninfo * 1.2.2   2021-12-06 [?] CRAN (R 4.2.0)
 P tidyr       * 1.2.0   2022-02-01 [?] CRAN (R 4.2.0)

 [1] /private/var/folders/xf/jb2591gj41xbj0c4y2d8_7ch0000gn/T/RtmpsJIstF/renv-library-1e576ca13792
 [2] /Users/ellakaye/Rprojs/mine/ellakaye-quarto/renv/library/R-4.2/aarch64-apple-darwin20
 [3] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library

 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────

Reuse

Citation

BibTeX citation:
@online{kaye2017,
  author = {Ella Kaye},
  title = {`N\_letter\_words` and a Personal (Publicly Available)
    Package},
  date = {2017-06-17},
  url = {https://ellakaye.co.uk/posts/2017-06-17_n-letter-words},
  langid = {en}
}
For attribution, please cite this work as:
Ella Kaye. 2017. “`N_letter_words` and a Personal (Publicly Available) Package.” June 17, 2017. https://ellakaye.co.uk/posts/2017-06-17_n-letter-words.