name: bibtex-management-guide description: "Clean, format, deduplicate, and manage BibTeX bibliography files for LaTeX" metadata: openclaw: emoji: "🗃️" category: "writing" subcategory: "citation" keywords: ["BibTeX formatting", "BibTeX conversion", "bibliography cleanup", "reference deduplication", "citation management"] source: "wentor"

BibTeX Management Guide

A skill for maintaining clean, consistent, and complete BibTeX bibliography files. Covers formatting standards, deduplication, common errors, and automated cleanup workflows essential for LaTeX-based academic writing.

BibTeX Entry Standards

Required Fields by Entry Type

% Article in a journal
@article{smith2024deep,
  author    = {Smith, John A. and Doe, Jane B.},
  title     = {Deep Learning for Climate Prediction: A Comparative Study},
  journal   = {Nature Machine Intelligence},
  year      = {2024},
  volume    = {6},
  number    = {3},
  pages     = {234--248},
  doi       = {10.1038/s42256-024-00001-1}
}

% Conference proceedings
@inproceedings{lee2024attention,
  author    = {Lee, Wei and Chen, Li},
  title     = {Attention Mechanisms for Scientific Document Understanding},
  booktitle = {Proceedings of the 62nd Annual Meeting of the ACL},
  year      = {2024},
  pages     = {1123--1135},
  publisher = {Association for Computational Linguistics},
  doi       = {10.18653/v1/2024.acl-main.89}
}

% Book
@book{bishop2006pattern,
  author    = {Bishop, Christopher M.},
  title     = {Pattern Recognition and Machine Learning},
  publisher = {Springer},
  year      = {2006},
  isbn      = {978-0387310732}
}

Automated BibTeX Cleanup

Deduplication

import re
from collections import defaultdict

def parse_bibtex_entries(bib_content: str) -> list[dict]:
    """
    Parse a BibTeX file into structured entries.
    """
    entries = []
    pattern = r'@(\w+)\{([^,]+),\s*(.*?)\n\}'
    matches = re.finditer(pattern, bib_content, re.DOTALL)

    for match in matches:
        entry = {
            'type': match.group(1).lower(),
            'key': match.group(2).strip(),
            'raw': match.group(0),
            'fields': {}
        }

        fields_str = match.group(3)
        field_pattern = r'(\w+)\s*=\s*[{\"](.+?)[}\"]'
        for field_match in re.finditer(field_pattern, fields_str, re.DOTALL):
            entry['fields'][field_match.group(1).lower()] = field_match.group(2).strip()

        entries.append(entry)

    return entries


def deduplicate_bibtex(entries: list[dict]) -> dict:
    """
    Find and remove duplicate BibTeX entries.

    Deduplication strategy:
    1. Exact DOI match
    2. Fuzzy title match (normalized)
    3. Author + year + first title word match
    """
    seen_dois = {}
    seen_titles = {}
    duplicates = []
    unique = []

    for entry in entries:
        doi = entry['fields'].get('doi', '').lower().strip()
        title = entry['fields'].get('title', '').lower().strip()
        title_normalized = re.sub(r'[^a-z0-9\s]', '', title)

        is_duplicate = False

        # Check DOI match
        if doi and doi in seen_dois:
            duplicates.append({
                'entry': entry['key'],
                'duplicate_of': seen_dois[doi],
                'reason': 'same DOI'
            })
            is_duplicate = True
        elif doi:
            seen_dois[doi] = entry['key']

        # Check title match
        if not is_duplicate and title_normalized:
            if title_normalized in seen_titles:
                duplicates.append({
                    'entry': entry['key'],
                    'duplicate_of': seen_titles[title_normalized],
                    'reason': 'same title'
                })
                is_duplicate = True
            else:
                seen_titles[title_normalized] = entry['key']

        if not is_duplicate:
            unique.append(entry)

    return {
        'unique_entries': len(unique),
        'duplicates_found': len(duplicates),
        'duplicates': duplicates,
        'entries': unique
    }

Field Formatting

def clean_bibtex_entry(entry: dict) -> dict:
    """
    Clean and standardize a BibTeX entry.
    """
    cleaned = entry.copy()
    fields = cleaned['fields']

    # Standardize author names: "Last, First and Last, First"
    if 'author' in fields:
        authors = fields['author']
        # Fix common issues
        authors = authors.replace(' AND ', ' and ')
        authors = authors.replace(' & ', ' and ')
        fields['author'] = authors

    # Ensure proper page ranges with en-dash
    if 'pages' in fields:
        fields['pages'] = fields['pages'].replace('-', '--').replace('---', '--')

    # Capitalize title properly (protect proper nouns with braces)
    if 'title' in fields:
        title = fields['title']
        # Protect acronyms and proper nouns
        words = title.split()
        for i, word in enumerate(words):
            if word.isupper() and len(word) > 1:
                words[i] = '{' + word + '}'
        fields['title'] = ' '.join(words)

    # Add missing DOI prefix
    if 'doi' in fields:
        doi = fields['doi']
        doi = doi.replace('https://doi.org/', '')
        doi = doi.replace('http://dx.doi.org/', '')
        fields['doi'] = doi

    # Remove empty fields
    fields = {k: v for k, v in fields.items() if v.strip()}
    cleaned['fields'] = fields

    return cleaned

DOI-Based Entry Generation

Fetch Complete BibTeX from DOI

import requests

def doi_to_bibtex(doi: str) -> str:
    """
    Retrieve a complete BibTeX entry from a DOI using CrossRef.
    """
    url = f"https://doi.org/{doi}"
    headers = {'Accept': 'application/x-bibtex'}
    response = requests.get(url, headers=headers, allow_redirects=True)

    if response.status_code == 200:
        return response.text
    else:
        return f"% Error: Could not retrieve BibTeX for DOI {doi}"

# Example
bibtex = doi_to_bibtex('10.1038/s41586-021-03819-2')
print(bibtex)

Citation Key Conventions

Consistent citation keys improve readability:

Convention: authorYEARfirstword
Examples:
  smith2024deep
  lee2024attention
  bishop2006pattern

For multiple papers by same author in same year:
  smith2024a, smith2024b

For papers with many authors:
  smithetal2024deep  (use "etal" for 3+ authors)

Validation Checklist

Before submitting a manuscript, validate your BibTeX file:

Every \cite{} in the manuscript has a matching entry in the .bib file
No orphaned entries (entries in .bib not cited in manuscript)
All entries have at minimum: author, title, year
All journal articles have: volume, pages (or article number), DOI
Page ranges use en-dash (--), not single hyphen
No encoding errors in author names (check accented characters)
Proper nouns and acronyms in titles are protected with braces
No duplicate entries exist

Use biber --validate-datamodel or checkcites for automated validation.

ナビゲーション

Skillsとは？

リンク

bibtex-management-guide