How I created a handy function and a personal package
There’s a little R function that I wrote and packaged up to generate a vector or data frame of words of a given length. I find it useful in a wide variety of contexts and thought other might too. To kick off my new blog, here’s a post about it.
The function, n_letter_words
, came about because I wanted to be able to generate row and column names for a large matrix - didn’t matter what they were, as long as they were unique. Since I was in the habit of using the built-in LETTERS
vector to do this for small matrices, I naturally thought of using combinations of letters to do this in a larger case. In figuring out how to do this, as is so often the case, it was stackoverflow to the rescue. There, I learnt about expand.grid
and could then use some tidyverse tools to get the vector I was after:
library(tidyverse)
out <- expand.grid(LETTERS, LETTERS) %>%
as_tibble() %>%
unite(word, 1:2, sep = "") %>%
pull()
c(head(out), tail(out))
[1] "AA" "BA" "CA" "DA" "EA" "FA" "UZ" "VZ" "WZ" "XZ" "YZ" "ZZ"
Sorted! At least I thought so, until, a couple of months later, when I wanted to generate names for a 1000*1000 matrix, and realised both that I’d forgotten the expand.grid
trick, and once I’d re-found the stackoverflow post, that it didn’t give me enough words. That was enough to make it worth writing a function, taking n
as an argument, that gives all ‘words’ of length \(n\).
Writing functions always makes me think of what other arguments might be useful. What if we want something between the 676 two-letter words and 17,576 three-letter words (or the 456,976 four-letter words, etc)? Hence the argument num_letters
, which can be set between 1 and 26, and results in a total of \(\text{num_letters}^n\) words. By default, the function returns a tibble
, but setting as_vector = TRUE
does what you’d expect. And I threw in a case
argument too.
Now that I had my function, what to do with it? I remembered articles I’d read about the usefulness of making and sharing a personal package. Now seemed like the time to do that myself.
So, here is my personal package, EMK
. If you think that n_letter_words
might be of use to you, then feel free to install!
devtools::install_github("EllaKaye/EMK")
Some examples of n_letter_words
:
library(EMK)
n_letter_words(2)
# A tibble: 676 x 1
word
<chr>
1 AA
2 BA
3 CA
4 DA
5 EA
6 FA
7 GA
8 HA
9 IA
10 JA
# … with 666 more rows
some_three_letter_words <- n_letter_words(
3, num_letters = 10,
case = "lower",
as_vector = TRUE
)
c(head(some_three_letter_words), tail(some_three_letter_words))
[1] "aaa" "baa" "caa" "daa" "eaa" "faa" "ejj" "fjj" "gjj" "hjj" "ijj"
[12] "jjj"
length(some_three_letter_words)
[1] 1000
For now, my personal package has only this one function, but watch this space! No doubt I’ll be adding more that I find useful. Perhaps, you’ll find them useful too.
Incidentally, none of the above would have happened if I’d just thought, for my test matrix A
, to set dimnames(A) <- list(1:nrow(A), 1:ncol(A))
!
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/EllaKaye/ellakaye-distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kaye (2017, June 17). ELLA KAYE: n_letter_words and a personal (publicly available) package. Retrieved from https://ellakaye.rbind.io/posts/2017-06-17-n-letter-words/
BibTeX citation
@misc{kaye2017n_letter_words, author = {Kaye, Ella}, title = {ELLA KAYE: n_letter_words and a personal (publicly available) package}, url = {https://ellakaye.rbind.io/posts/2017-06-17-n-letter-words/}, year = {2017} }