Toggle the code
library(dplyr)
library(tidyr)
<- expand.grid(LETTERS, LETTERS) %>%
out as_tibble() %>%
unite(word, 1:2, sep = "") %>%
pull()
c(head(out), tail(out))
[1] "AA" "BA" "CA" "DA" "EA" "FA" "UZ" "VZ" "WZ" "XZ" "YZ" "ZZ"
n_letter_words
and a personal (publicly available) packageHow I created a handy function and a personal package
June 17, 2017
There’s a little R function that I wrote and packaged up to generate a vector or data frame of words of a given length. I find it useful in a wide variety of contexts and thought other might too. To kick off my new blog, here’s a post about it.
The function, n_letter_words
, came about because I wanted to be able to generate row and column names for a large matrix - didn’t matter what they were, as long as they were unique. Since I was in the habit of using the built-in LETTERS
vector to do this for small matrices, I naturally thought of using combinations of letters to do this in a larger case. In figuring out how to do this, as is so often the case, it was stackoverflow to the rescue. There, I learnt about expand.grid
and could then use some tidyverse tools to get the vector I was after:
[1] "AA" "BA" "CA" "DA" "EA" "FA" "UZ" "VZ" "WZ" "XZ" "YZ" "ZZ"
Sorted! At least I thought so, until, a couple of months later, when I wanted to generate names for a 1000*1000 matrix, and realised both that I’d forgotten the expand.grid
trick, and once I’d re-found the stackoverflow post, that it didn’t give me enough words. That was enough to make it worth writing a function, taking n
as an argument, that gives all ‘words’ of length \(n\).
Writing functions always makes me think of what other arguments might be useful. What if we want something between the 676 two-letter words and 17,576 three-letter words (or the 456,976 four-letter words, etc)? Hence the argument num_letters
, which can be set between 1 and 26, and results in a total of num_letters
\(^n\) words. By default, the function returns a tibble
, but setting as_vector = TRUE
does what you’d expect. And I threw in a case
argument too.
Now that I had my function, what to do with it? I remembered articles I’d read about the usefulness of making and sharing a personal package. Now seemed like the time to do that myself.
So, here is my personal package, EMK
. If you think that n_letter_words
might be of use to you, then feel free to install!
Some examples of n_letter_words
:
# A tibble: 676 × 1
word
<chr>
1 AA
2 BA
3 CA
4 DA
5 EA
6 FA
7 GA
8 HA
9 IA
10 JA
# … with 666 more rows
[1] "aaa" "baa" "caa" "daa" "eaa" "faa" "ejj" "fjj" "gjj" "hjj" "ijj" "jjj"
[1] 1000
For now, my personal package has only this one function, but watch this space! No doubt I’ll be adding more that I find useful. Perhaps, you’ll find them useful too.
Incidentally, none of the above would have happened if I’d just thought, for my test matrix A
, to set dimnames(A) <- list(1:nrow(A), 1:ncol(A))
!
2022-11-11 21:27:19 GMT
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.2.1 (2022-06-23)
os macOS Monterey 12.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/London
date 2022-11-11
pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
quarto 1.2.247 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
P dplyr * 1.0.10 2022-09-01 [?] CRAN (R 4.2.0)
P EMK * 0.1.0 2022-08-16 [?] Github (EllaKaye/EMK@a28e89e)
P sessioninfo * 1.2.2 2021-12-06 [?] CRAN (R 4.2.0)
P tidyr * 1.2.0 2022-02-01 [?] CRAN (R 4.2.0)
[1] /private/var/folders/xf/jb2591gj41xbj0c4y2d8_7ch0000gn/T/RtmpsJIstF/renv-library-1e576ca13792
[2] /Users/ellakaye/Rprojs/mine/ellakaye-quarto/renv/library/R-4.2/aarch64-apple-darwin20
[3] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────
@online{kaye2017,
author = {Kaye, Ella},
title = {`N\_letter\_words` and a Personal (Publicly Available)
Package},
date = {2017-06-17},
url = {https://ellakaye.co.uk/posts/2017-06-17_n-letter-words/},
langid = {en}
}
---
title: "`n_letter_words` and a personal (publicly available) package"
description: |
How I created a handy function and a personal package
date: 2017-06-17
author:
- name: Ella Kaye
##url: httep://twitter.com/ellakaye
site-url: https://ellakaye.co.uk
image: ../../images/hex/EMK.png
image-alt: |
A hexagon with a bright pink outline. Inside it, the letters E, M and K in the
same bright pink, on a white background
categories:
- R
- package development
open-graph: true
twitter-card: true
# uncomment lines below for different title and description to post
# title: |
# description: |
# uncomment for different image to post
# image: |
# image-alt: |
# defaults to 500 x 500 summary: uncomment lines below for large card
# image-width: 600
# image-height: 314
# card-style: summary_large_image
# image-width: 500
# image-height: 500
# card-style: summary
draft: false
---
```{r}
#| echo: false
#| results: 'hide'
long_slug <- "2017-06-17_n-letter-words"
# NOTE: after finishing post, run renv::snapshot() and copy the renv.lock file
# from the project root into the post directory
renv::use(lockfile = "renv.lock")
```
There's a little R function that I wrote and packaged up to generate a vector or data frame of words of a given length. I find it useful in a wide variety of contexts and thought other might too. To kick off my new blog, here's a post about it.
The function, `n_letter_words`, came about because I wanted to be able to generate row and column names for a large matrix - didn't matter what they were, as long as they were unique. Since I was in the habit of using the built-in `LETTERS` vector to do this for small matrices, I naturally thought of using combinations of letters to do this in a larger case. In figuring out how to do this, as is so often the case, it was [stackoverflow](https://stackoverflow.com/questions/11388359/unique-combination-of-all-elements-from-two-or-more-vectors) to the rescue. There, I learnt about `expand.grid` and could then use some tidyverse tools to get the vector I was after:
```{r warning=FALSE, message=FALSE}
library(dplyr)
library(tidyr)
out <- expand.grid(LETTERS, LETTERS) %>%
as_tibble() %>%
unite(word, 1:2, sep = "") %>%
pull()
c(head(out), tail(out))
```
Sorted! At least I thought so, until, a couple of months later, when I wanted to generate names for a 1000\*1000 matrix, and realised both that I'd forgotten the `expand.grid` trick, and once I'd re-found the stackoverflow post, that it didn't give me enough words. That was enough to make it worth writing a function, taking `n` as an argument, that gives all 'words' of length $n$.
Writing functions always makes me think of what other arguments might be useful. What if we want something between the 676 two-letter words and 17,576 three-letter words (or the 456,976 four-letter words, etc)? Hence the argument `num_letters`, which can be set between 1 and 26, and results in a total of `num_letters`$^n$ words. By default, the function returns a `tibble`, but setting `as_vector = TRUE` does what you'd expect. And I threw in a `case` argument too.
Now that I had my function, what to do with it? I remembered articles I'd read about the usefulness of [making](https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/) and [sharing]((https://hilaryparker.com/2013/04/03/personal-r-packages/)) a personal package. Now seemed like the time to do that myself.
So, [here](https://github.com/EllaKaye/EMK) is my personal package, `EMK`. If you think that `n_letter_words` might be of use to you, then feel free to install!
```{r message=FALSE, eval = FALSE}
devtools::install_github("EllaKaye/EMK")
```
Some examples of `n_letter_words`:
```{r warning = FALSE, message = FALSE}
library(EMK)
n_letter_words(2)
some_three_letter_words <- n_letter_words(
n = 3,
num_letters = 10,
case = "lower",
as_vector = TRUE
)
c(head(some_three_letter_words), tail(some_three_letter_words))
length(some_three_letter_words)
```
For now, my personal package has only this one function, but watch this space! No doubt I'll be adding more that I find useful. Perhaps, you'll find them useful too.
Incidentally, none of the above would have happened if I'd just thought, for my test matrix `A`, to set `dimnames(A) <- list(1:nrow(A), 1:ncol(A))`!
<!--------------- appendices go here ----------------->
```{r appendix}
#| echo: false
source("../../R/appendix.R")
insert_appendix(
repo_spec = "EllaKaye/ellakaye.co.uk",
name = long_slug
)
```
#### Session info {.appendix}
<details><summary>Toggle</summary>
```{r}
#| echo: false
library(sessioninfo)
# save the session info as an object
pkg_session <- session_info(pkgs = "attached")
# get the quarto version
quarto_version <- system("quarto --version", intern = TRUE)
# inject the quarto info
pkg_session$platform$quarto <- paste(
system("quarto --version", intern = TRUE),
"@",
quarto::quarto_path()
)
# print it out
pkg_session
```
</details>