name: r-anti-slop description: > Enforce production-quality R code standards. Prevents generic AI patterns through namespace qualification, explicit returns, and tidyverse conventions. Use when writing or reviewing R code for data analysis or packages. applies_to:
- "**/*.R"
- "**/*.Rmd"
- "**/*.qmd" tags: [r, tidyverse, code-quality, data-science] related_skills:
- quarto/anti-slop
- text/anti-slop version: 2.0.0
R Anti-Slop: Stop Writing df <- data
When to Use This
Use this for:
- ✓ Any R code leaving your machine (analysis, packages, scripts)
- ✓ AI-generated code review (catches
df,result, missing::) - ✓ CRAN submissions (they'll reject generic code anyway)
- ✓ Team code standards
Skip for:
- Quick console experiments (though habits form fast)
- Legacy code you can't touch
- Bioconductor or other style guides that override this
Quick Example
Before (AI Slop):
# Load the library
library(dplyr)
# Read the data
df <- read.csv("data.csv")
# Filter the data
result <- df %>% filter(x > 0)
After (Anti-Slop):
customer_data <- readr::read_csv("data/customers.csv")
active_customers <- customer_data |>
dplyr::filter(status == "active", revenue > 0)
return(active_customers)
What changed:
- ✓ Descriptive names (
customer_datanotdf) - ✓ Namespace qualification (
dplyr::,readr::) - ✓ Native pipe (
|>not%>%) - ✓ No obvious comments
- ✓ Explicit return
When to Use What
| If you need to... | Do this | Details |
|---|---|---|
| Name variables | Use snake_case, no df/data/result | reference/naming.md |
| Call tidyverse functions | Always use :: (e.g., dplyr::filter()) | reference/tidyverse.md |
| Return from function | Always explicit return() statement | reference/naming.md |
| Write pipe chains | Use |>, break at 8+ operations | reference/tidyverse.md |
| Document functions | Specific @param, @return, no circular text | reference/documentation.md |
| Handle missing data | Explicit strategy + report data loss | reference/statistical-rigor.md |
| Validate data | Check assumptions with stopifnot() | reference/statistical-rigor.md |
| Format code | Use styler::style_file() | reference/tidyverse.md |
| Check code quality | Use lintr::lint() | reference/tidyverse.md |
Core Workflow
5-Step Quality Check
-
Namespace qualification - All external functions use
::# Good dplyr::filter(data, x > 0) # Bad filter(data, x > 0) -
Explicit returns - Every function has
return()# Good my_function <- function(x) { result <- x + 1 return(result) } # Bad my_function <- function(x) { x + 1 } -
Naming conventions - All objects use
snake_case# Good customer_lifetime_value <- calculate_clv(data) # Bad df <- calculate_clv(data) customerLifetimeValue <- calculate_clv(data) -
Documentation quality - No generic descriptions
# Good #' @param deaths Data frame with `age_group` and `count` columns # Bad #' @param data The data -
Code formatting - Run styler and lintr
styler::style_file("script.R") lintr::lint("script.R")
Quick Reference Checklist
Before committing R code, verify:
- All external functions qualified with
:: - All functions have explicit
return() - All objects use
snake_case - No generic names (
df,data,result,temp) - Pipes (
|>) have space before, end lines - Long pipelines (>8 ops) broken into named steps
- Complex operations have WHY comments
- Data validated after transformations
- Seeds set before random operations
- Uncertainty reported (SE, CI) for statistical models
- No
attach()calls - No right-hand assignment (
->) - Roxygen documentation is specific
- Examples are realistic and run
Common Workflows
Workflow 1: Clean Up AI-Generated R Script
Context: AI generated an analysis script with generic patterns.
Steps:
-
Run detection script
Rscript toolkit/scripts/detect_slop.R analysis.R --verbose -
Fix high-priority issues first
# Replace df, data, result with descriptive names # Before df <- readr::read_csv("data.csv") result <- df %>% filter(x > 0) # After customer_data <- readr::read_csv("data/customers.csv") active_customers <- customer_data |> dplyr::filter(status == "active") -
Add namespace qualification
# Before data %>% filter(x > 0) %>% summarize(mean(y)) # After data |> dplyr::filter(x > 0) |> dplyr::summarize(mean_y = mean(y)) -
Add explicit returns
# Before calculate_rate <- function(numerator, denominator) { numerator / denominator } # After calculate_rate <- function(numerator, denominator) { rate <- numerator / denominator return(rate) } -
Break long pipes
# Before (12 operations in one chain) result <- data |> filter(...) |> mutate(...) |> group_by(...) |> summarize(...) |> arrange(...) |> [7 more ops] # After clean_data <- data |> dplyr::filter(!is.na(value)) |> dplyr::mutate(category = categorize(value)) summary_stats <- clean_data |> dplyr::group_by(category) |> dplyr::summarize(mean_val = mean(value)) -
Format and validate
styler::style_file("analysis.R") lintr::lint("analysis.R")
Expected outcome: Score drops from 60+ to <20
Workflow 2: Fix Generic Package Documentation
Context: R package has generic roxygen documentation.
Steps:
-
Identify generic patterns
# Bad #' Process Data #' #' @description This function processes the data. #' @param data The data. #' @return The result. -
Make description specific
# Good #' Calculate age-adjusted mortality rates #' #' Computes mortality rates per 100,000 population, standardized to the #' 2000 US Census age distribution using direct standardization. -
Describe parameter structure
# Good #' @param deaths Data frame with columns `age_group` and `count`. #' @param population Data frame with columns `age_group` and `pop_size`. -
Specify return value
# Good #' @return A tibble with columns: #' \describe{ #' \item{county}{County FIPS code} #' \item{rate}{Age-adjusted rate per 100,000} #' \item{se}{Standard error of the rate} #' } -
Add realistic examples
# Good #' @examples #' counties <- data.frame( #' county = c("A", "B"), #' deaths = c(150, 200), #' population = c(50000, 80000) #' ) #' #' adjust_rates(counties, rate_per = 100000) #' #> # A tibble: 2 x 3 #' #> county rate se #' #> 1 A 312. 25.4 #' #> 2 B 258. 18.2
Expected outcome: Documentation that teaches, not restates
Workflow 3: Prepare Package for CRAN
Context: Final checks before CRAN submission.
Steps:
-
Run all quality checks
# Standard checks devtools::check() # Anti-slop checks lapply(list.files("R", full.names = TRUE), function(f) { system(paste("Rscript toolkit/scripts/detect_slop.R", f)) }) -
Fix documentation
- Check all
@paramdescriptions are specific - Verify
@examplesrun and are realistic - Ensure
@returndescribes structure
- Check all
-
Validate code quality
# Format all files styler::style_dir("R/") # Check lints lintr::lint_package() -
Check CRAN-specific requirements
- Validate URLs in DESCRIPTION and documentation
- Check examples run in < 5 seconds
- Verify package structure meets CRAN standards
Expected outcome: Clean R CMD check with no slop patterns
Mandatory Rules Summary
1. Namespace Qualification
ALWAYS use :: for external packages
Exceptions (don't need ::):
- Base R:
mean(),sum(),log(), etc. - stats:
lm(),glm(),t.test(), etc. - utils:
head(),tail(),str(), etc.
2. Explicit Returns
ALWAYS use return() - never implicit
3. Naming: snake_case
All objects use snake_case
- Variables:
customer_datanotcustomerDataordf - Functions:
calculate_ratenotcalculateRate - Arguments:
input_datanotinputData
4. Native Pipe
Prefer |> over %>% (unless R < 4.1)
5. No Generic Names
Never use: df, data, result, temp, x, n (except standard math notation)
Tidyverse Philosophy
Follow Tidyverse Style Guide as primary reference:
- Design for humans - Code should be readable and intuitive
- Reuse existing data structures - Work with tibbles and data frames
- Compose simple functions with pipes - Build complexity through composition
- Embrace functional programming - Functions are first-class objects
See reference/tidyverse.md for complete tidyverse conventions.
Resources & Advanced Topics
Reference Files
- reference/naming.md - Complete naming conventions and forbidden patterns
- reference/tidyverse.md - Pipe conventions, formatting, ggplot2 standards
- reference/documentation.md - Roxygen2, vignettes, README quality
- reference/statistical-rigor.md - Validation, uncertainty, reproducibility
- reference/forbidden-patterns.md - Complete antipattern catalog
Related Skills
- text/anti-slop - For cleaning prose in documentation
- quarto/anti-slop - For cleaning vignettes and documentation
Tools
styler::style_file()- Auto-format codelintr::lint()- Check code qualityRscript toolkit/scripts/detect_slop.R- Detect AI patterns
Integration with Posit Skills
This skill focuses on code quality and avoiding generic patterns.
Use together with Posit skills for complete coverage:
| Task | Use This Skill | + Posit Skill |
|---|---|---|
| Write error messages | r/anti-slop (quality) | + r-lib/cli (structure) |
| Write tests | r/anti-slop (code quality) | + r-lib/testing (test patterns) |
| Prepare for CRAN | r/anti-slop (no slop) | + r-lib/cran-extrachecks (requirements) |
| Document lifecycle | r/anti-slop (doc quality) | + r-lib/lifecycle (deprecation) |