Case Study

Launch a Plagiarism Sentinel

Verify academic integrity at scale.
Reveal the sources behind every borrowed paragraph.
Turn suspicious passages into defensible evidence.

Investigation

Professor Scribble receives a civic design assignment titled "Sunrise Community Power Plan". It outlines solar awnings, rainwater batteries, and PTA outreach in language that feels suspiciously professional. Let us help the professor trace where this language came from.

Open Access publishing made scholarly articles incredibly easy to access—and copy! Millions of peer-reviewed PDFs are only a search away. To ensure integrity, it is critical to vet every student's work against the world's research. ScholarAPI turns this task into a simple, instant query.

Whether it's a freshman essay or a doctoral thesis, a plagiarism checker is only as strong as its reference library. ScholarAPI connects you to a massive volume of global literature, ensuring no source goes unchecked. The investigation begins with the text itself.

"

Student Assignment

"Our community center could slash utility bills by installing modular solar awnings over classrooms and routing the power to evening programs. We plan to hide rainwater batteries tucked under bleachers so pep rallies double as storage days. The budget leans on a city green bond grant playbook plus weekend solar fairs reduce bills messaging the PTA already loves. The plan also borrowed lessons from Harbor Institute, even though that case..."

A small number of snippets selected at random serve as unique clues.
Together, they triangulate the borrowed source with high confidence.

[]

Shingles

[
  "modular solar awnings over classrooms",
  "rainwater batteries tucked under bleachers",
  "plus weekend solar fairs reduce",
  "from Harbor Institute, even though"
]

Fingerprints

To identify the real source, you need a handful of revealing passages — "shingles". Each shingle is a short sequence of words that acts as a fingerprint for the source material.

Slide a window across the input text and select a fixed number of passages. Pick them at random or focus on rare, content-rich phrases while discarding common ones ("the study shows that"). Selecting more shingles increases the chance of identification, even if the student paraphrased some sections.

Wrap shingles in quotes "..." and pass them to ScholarAPI's search engine. Punctuation (commas, dots) and common words (and, or, is, etc.) can remain; they will be handled automatically during search.

Ping for Suspects

To retrieve a list of likely sources, send all fingerprints to the /search endpoint using repeated q parameters. ScholarAPI returns metadata for up to 50 articles that contain at least one of the shingles, ranked by relevance. The documents with the most matches—your strongest suspects—appear at the top.

Search query

/api/v1/search
?q="modular solar awnings over classrooms"
&q="rainwater batteries tucked under bleachers"
&q="plus weekend solar fairs reduce"
&q="from Harbor Institute, even though"
import requests

phrases = [
    "modular solar awnings over classrooms",
    "rainwater batteries tucked under bleachers",
    # ... more phrases
]

# "q" param matches multiple values
params = [("q", p) for p in phrases]

resp = requests.get(
    "https://scholarapi.net/api/v1/search",
    params=params,
    headers={"X-API-Key": "YOUR_KEY"}
)

for hit in resp.json()["results"]:
    print(hit['id'], hit['title'])

The Lineup

Each hit is a lead. Capture the IDs and fetch their full text to calculate specialized plagiarism scores offline.

1

Hit #1

{
  "id": "ae72c4",
  "title": "Community Microgrids for School Campuses",
  "authors": ["A. Delgado", "S. Hart"],
  "journal": "Journal of Civic Energy",
  "published_date": "2024-05-18",
  ...
}
3

Hit #3

{
  "id": "b0f81a",
  "title": "PTA Playbooks for Solar Outreach",
  "authors": ["Harbor Institute Energy Lab"],
  "journal": "Proceedings of Community Energy",
  "published_date": "2022-11-11",
  ...
}
2

Hit #2

{
  "id": "c491de",
  "title": "Rainwater Batteries for Public Gyms",
  "authors": ["J. Mensah", "L. Ortiz"],
  "journal": "Urban Sustainability Letters",
  "published_date": "2023-09-02",
  ...
}

Cross-Examination

With the id of every top-matching article in hand, call the /text endpoint to retrieve the full text for each suspect. Once fetched, run your preferred similarity algorithm — Levenshtein, Jaccard, vector embeddings, trigrams, shingled comparison — to determine if the match is plagiarism or a harmless overlap.

When you need to score many IDs at once, use the bulk /texts endpoint to fetch up to 100 plain-text articles in a single call.

Running scoring logic offline allows for precise, defensible evidence. You can generate side-by-side comparisons and reports to support your findings, whether for a teacher, supervisor, or editor.

Full Text Evidence ready
GET /api/v1/text/{id}
import requests
from difflib import SequenceMatcher

def check_similarity(student_text, article_id):
    # 1. Fetch full text
    resp = requests.get(
        f"https://scholarapi.net/api/v1/text/{article_id}",
        headers={"X-API-Key": "YOUR_KEY"}
    )
    article_text = resp.text

    # 2. Compare locally using Ratcliff/Obershelp algorithm
    ratio = SequenceMatcher(None, student_text, article_text).ratio()
    if ratio > 0.35:
        print(f"Match found: {ratio:.2f}")

Case Closed

Professor Scribble's suspicions were well-founded. With the help of ScholarAPI, your plagiarism checker located the exact source and proved that the assignment was copied.

Evidence slice

student : "installing modular solar awnings over classrooms and routing"
source  : "to install modular solar awnings over the classrooms and routing"
similarity: 0.92

Your checker computes similarity offline and highlights the identical wording for the instructor.

Sync Flow

GET /list
[
  { "id": "a1", "indexed_at": "2024-01-15T09:00Z" },
  { "id": "b2", "indexed_at": "2024-01-15T09:05Z" }
]

GET /list?indexed_after=2024-01-15T09:05Z
[
  { "id": "c3", "indexed_at": "2024-01-15T09:10Z" },
  ...
]

Offline Data

Large-scale anti-plagiarism tools often need their own offline index for maximum flexibility and performance. ScholarAPI makes this possible with the /list endpoint that lets you iterate systematically over all publications in indexing order — from oldest to newest — then retrieve full texts for your local database.

To build the local index, call /list repeatedly, passing the indexed_after parameter with the indexed_at value of the last publication retrieved. Like before, you can filter the results with a search query, q, to collect publications from a specific domain, such as "civic design" or "paleontology". Locally, you can deploy custom similarity models, cross-lingual checks, or metadata tagging with campus-specific taxonomies.

Whether you are policing student essays, research manuscripts, or grant proposals, ScholarAPI provides a solid data foundation to detect misconduct and uphold academic integrity.

End of Guide

Ready to build your Plagiarism Checker?