Help:Wiki Replicas/Queries/Example queries
Atypical log entries
SELECT
actor_name,
log_namespace,
log_timestamp,
log_action,
log_title,
comment_text
FROM logging_userindex
JOIN `actor`
ON log_actor = actor_id
JOIN `comment`
ON comment_id = log_comment_id
WHERE log_type = 'delete'
AND log_action != 'restore'
AND log_action != 'delete';
Explanation: This pulls log entries from the deletion log that aren't restore or delete actions. Works on Toolforge
Broken redirects
- See also source code of Special:BrokenRedirects
SELECT
p1.page_namespace,
p1.page_title
FROM redirect AS rd
JOIN page p1
ON rd.rd_from = p1.page_id
LEFT JOIN page AS p2
ON rd_namespace = p2.page_namespace
AND rd_title = p2.page_title
WHERE rd_namespace >= 0
AND p2.page_namespace IS NULL
ORDER BY p1.page_namespace ASC;
Explanation: This pulls all broken redirects. Works on Toolforge
Cross-namespace redirects
SELECT COUNT(DISTINCT rd_from)
FROM redirect
INNER JOIN page ON redirect.rd_from = page.page_id
AND redirect.rd_namespace != 0
AND page.page_namespace = 0;
Explanation: This counts redirects from (Main) to any other namespace. Works on Toolforge
Deleted red-linked categories
SELECT
cattmp.cl_to,
cattmp.cl_count,
user_name,
log_timestamp,
log_comment
FROM logging
JOIN `user` ON log_user = user_id
JOIN
(SELECT
cl_to,
COUNT(cl_to) AS cl_count
FROM categorylinks
LEFT JOIN page ON cl_to = page_title
AND page_namespace = 14
WHERE page_title IS NULL
GROUP BY cl_to) AS cattmp
ON log_title = cattmp.cl_to
WHERE log_namespace = 14
AND log_type = "delete"
AND log_timestamp = (SELECT
MAX(log_timestamp)
FROM logging AS last
WHERE log_namespace = 14
AND cattmp.cl_to = last.log_title);
Explanation: This pulls non-existent used categories that have previously been deleted. Works on Toolforge
Empty categories
SELECT
page_title,
page_len
FROM categorylinks
RIGHT JOIN page ON cl_to = page_title
WHERE page_namespace = 14
AND page_is_redirect = 0
AND cl_to IS NULL
AND NOT EXISTS (SELECT
1
FROM categorylinks
WHERE cl_from = page_id
AND cl_to = 'Wikipedia_category_redirects')
AND NOT EXISTS (SELECT
1
FROM categorylinks
WHERE cl_from = page_id
AND cl_to = 'Disambiguation_categories')
AND NOT EXISTS (SELECT
1
FROM templatelinks
WHERE tl_from = page_id
AND tl_namespace = 10
AND tl_title = 'Empty_category');
Explanation: This pulls empty categories that aren't in specific categories and don't transclude a specific template. Works on Toolforge
Fully-protected articles with excessively long expiries
SELECT
page_is_redirect,
page_title,
user_name,
logs.log_timestamp,
pr_expiry,
logs.log_comment
FROM page
JOIN page_restrictions ON page_id = pr_page
AND page_namespace = 0
AND pr_type = 'edit'
AND pr_level = 'sysop'
AND pr_expiry > DATE_FORMAT(DATE_ADD(NOW(),INTERVAL 1 MONTH),'%Y%m%d%H%i%s')
AND pr_expiry != 'infinity'
LEFT JOIN logging AS logs ON logs.log_title = page_title
AND logs.log_namespace = 0
AND logs.log_type = 'protect'
LEFT JOIN `user` ON logs.log_user = user_id
WHERE CASE WHEN (NOT ISNULL(log_timestamp))
THEN log_timestamp = (SELECT MAX(last.log_timestamp)
FROM logging AS last
WHERE log_title = page_title
AND log_namespace = 0
AND log_type = 'protect')
ELSE 1 END;
Explanation: Articles that are fully-protected from editing for more than one month. Works on Toolforge
Excessively long IP blocks
SELECT
ipb_address,
ipb_by_actor,
ipb_timestamp,
ipb_expiry,
ipb_reason_id
FROM ipblocks
WHERE ipb_expiry > DATE_FORMAT(DATE_ADD(NOW(),INTERVAL 2 YEAR),'%Y%m%d%H%i%s')
AND ipb_expiry != "infinity"
AND ipb_user = 0;
Explanation: Blocks of anonymous users that are longer than two years (but not indefinite). Works on Toolforge
Indefinitely fully-protected articles
SELECT
page_is_redirect,
page_title,
user_name,
logs.log_timestamp,
logs.log_comment
FROM page
JOIN page_restrictions ON page_id = pr_page
AND page_namespace = 0
AND pr_type = 'edit'
AND pr_level = 'sysop'
AND pr_expiry = 'infinity'
LEFT JOIN logging AS logs ON logs.log_title = page_title
AND logs.log_namespace = 0
AND logs.log_type = 'protect'
LEFT JOIN `user` ON logs.log_user = user_id
WHERE CASE WHEN (NOT ISNULL(log_timestamp))
THEN log_timestamp = (SELECT MAX(last.log_timestamp)
FROM logging AS last
WHERE log_title = page_title
AND log_namespace = 0
AND log_type = 'protect')
ELSE 1 END;
Explanation: Articles indefinitely fully-protected from editing. Works on Toolforge
Long pages
SELECT
page_namespace,
page_title,
page_len
FROM page
WHERE page_len > 175000
AND page_title NOT LIKE "%/%"
ORDER BY page_namespace ASC;
Explanation: Pages over 175,000 bytes in length; excludes titles with "/" in them (to avoid archives, etc.). Works on Toolforge
Redirects obscuring page content
SELECT
page_namespace,
page_title,
page_len
FROM page
WHERE page_is_redirect = 1
HAVING page_len > 449
ORDER BY page_namespace ASC;
Explanation: Redirects with large page lengths. This usually indicates there is text below the redirect that should not be there. Works on Toolforge
Mistagged non-free content
SELECT
DISTINCT(enwiki_p.page.page_title),
commonswiki_p.image.img_name
FROM enwiki_p.image, commonswiki_p.image, enwiki_p.categorylinks, enwiki_p.page
WHERE enwiki_p.image.img_sha1 = commonswiki_p.image.img_sha1
AND enwiki_p.page.page_title = enwiki_p.image.img_name
AND enwiki_p.categorylinks.cl_from = enwiki_p.page.page_id
AND enwiki_p.categorylinks.cl_to = 'All_non-free_media'
AND enwiki_p.image.img_sha1 != 'phoiac9h4m842xq45sp7s6u21eteeq1';
Explanation: This pulls files on a local wiki that are in non-free category, but also exist at Commons. This indicates that either the local image or the Commons image should be deleted. It excludes the SHA1 empty string due to bad database rows.
Pages with the most revisions
SELECT
page_namespace,
ns_name,
page_title,
COUNT(*)
FROM revision
JOIN page ON page_id = rev_page
JOIN toolserver.namespace ON page_namespace = ns_id
AND dbname = 'enwiki_p'
GROUP BY page_namespace, page_title
ORDER BY COUNT(*) DESC
LIMIT 1000;
Explanation: This pulls the pages with the most revisions. On large wikis, it can take several hours (or days) to run.
Page counts by namespace
SELECT
page_namespace,
MAX(notredir),
MAX(redir)
FROM (
SELECT page.page_namespace,
IF( page_is_redirect, COUNT(page.page_namespace), 0 ) AS redir,
IF( page_is_redirect, 0, COUNT(page.page_namespace)) AS notredir
FROM page
GROUP BY page_is_redirect, page_namespace
ORDER BY page_namespace, page_is_redirect
) AS pagetmp
GROUP BY page_namespace;
Explanation: This pulls the number of redirects and non-redirects in each namespace. Works on Toolforge
Orphaned talk pages
SELECT
p1.page_namespace,
p1.page_title
FROM page AS p1
WHERE p1.page_title NOT LIKE "%/%"
AND p1.page_namespace NOT IN (0,2,3,4,6,8,9,10,12,14,16,18,100,102,104,108, 118, 710, 828, 2300, 2600)
AND CASE WHEN p1.page_namespace = 1
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 0
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 5
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 4
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 7
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 6
AND p1.page_title = p2.page_title)
AND NOT EXISTS (SELECT
1
FROM commonswiki_p.page AS p2
WHERE p2.page_namespace = 6
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 11
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 10
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 13
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 12
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 15
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 14
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 17
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 16
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 101
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 100
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 109
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 108
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 119
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 118
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 711
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 710
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 829
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 828
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 2301
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 2300
AND p1.page_title = p2.page_title)
ELSE 1 END
AND p1.page_id NOT IN (SELECT
page_id
FROM page
JOIN templatelinks
ON page_id = tl_from
WHERE tl_title="G8-exempt"
AND tl_namespace = 10)
AND p1.page_id NOT IN (SELECT
page_id
FROM page
JOIN templatelinks
ON page_id = tl_from
WHERE tl_title="Rtd"
AND tl_namespace = 10);
Explanation: This (very, very hackishly) pulls all pages in the talk namespaces that don't have a corresponding subject-space page. It JOINs against Commons to ensure that File_talk: pages are truly orphaned. It also has some en.wiki-specific template checks in it. Works on Toolforge
SELECT page_namespace, page_title, page_len, page_is_redirect, page_is_new FROM page AS p1
WHERE
p1.page_namespace % 2 = 1
AND (SELECT COUNT(*) FROM page AS p2 WHERE p1.page_title = p2.page_title AND p2.page_namespace = (p1.page_namespace - 1)) = 0
AND p1.page_namespace != 3
Ownerless pages in the user space
SELECT
page_namespace,
page_title,
page_len
FROM page
LEFT JOIN `user`
ON user_name = REPLACE(page_title, '_', ' ')
WHERE page_namespace IN (2,3)
AND page_is_redirect = 0
AND page_title NOT LIKE "%/%"
AND page_title NOT RLIKE "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
AND user_name IS NULL;
Explanation: This pulls all User: and User_talk: pages not belonging to a registered user. Pages belonging to an anonymous user are excluded. Works on Toolforge
- This doesn't exclude IPv6 addresses.
Polluted categories
SELECT DISTINCT
cl_to
FROM categorylinks AS cat
JOIN page AS pg1
ON cat.cl_from = pg1.page_id
WHERE page_namespace = 2
AND EXISTS (SELECT
1
FROM page AS pg2
JOIN categorylinks AS cl
ON pg2.page_id = cl.cl_from
WHERE pg2.page_namespace = 0
AND cat.cl_to = cl.cl_to)
AND cl_to NOT IN (SELECT
page_title
FROM page
JOIN templatelinks
ON tl_from = page_id
WHERE page_namespace = 14
AND tl_namespace = 10
AND tl_title = 'Pollutedcat')
LIMIT 250;
Explanation: This pulls categories that contain pages in the (Main) namespace and the User: namespace. Generally categories hold one or the other. Works on Toolforge
Categories categorized in red-linked categories
SELECT
page_title,
cl_to
FROM page
JOIN
(SELECT
cl_to,
cl_from
FROM categorylinks
LEFT JOIN page ON cl_to = page_title
AND page_namespace = 14
WHERE page_title IS NULL) AS cattmp
ON cattmp.cl_from = page_id
WHERE page_namespace = 14;
Explanation: This pulls categories categorized in red-linked categories. Works on Toolforge
Articles containing red-linked files
SELECT
page_title,
il_to
FROM page
JOIN imagelinks
ON page_id = il_from
WHERE (NOT EXISTS(
SELECT
1
FROM image
WHERE img_name = il_to))
AND (NOT EXISTS(
SELECT
1
FROM commonswiki_f_p.page
WHERE page_title = il_to
AND page_namespace = 6))
AND page_namespace = 0;
Explanation: This pulls articles that contain red-linked images. It checks both Commons and the local wiki. Works on Toolforge
Articles in a specific category containing red-links
SELECT
page_title
FROM page
JOIN pagelinks
ON pl_from = page_id
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = "Mass_Rapid_Transit_(Singapore)_stations"
AND pl_namespace = 0
AND (NOT EXISTS(
SELECT
1
FROM enwiki_p.page
WHERE page_title = pl_title
AND page_namespace = 0))
AND page_namespace = 0
LIMIT 100;
Explanation: This pulls articles that contain red-links from a specific category.
Self-categorized categories
SELECT
page_title,
cat_pages,
cat_subcats
FROM page
JOIN categorylinks ON cl_to = page_title
RIGHT JOIN category
ON cat_title = page_title
WHERE page_id = cl_from
AND page_namespace = 14;
Explanation: This pulls self-categorized categories. Works on Toolforge
Uncategorized categories
SELECT
page_title,
page_len,
cat_pages,
rev_timestamp,
rev_user_text
FROM revision
JOIN
(SELECT
page_id,
page_title,
page_len,
cat_pages
FROM category
RIGHT JOIN page ON cat_title = page_title
LEFT JOIN categorylinks ON page_id = cl_from
WHERE cl_from IS NULL
AND page_namespace = 14
AND page_is_redirect = 0) AS pagetmp
ON rev_page = pagetmp.page_id
AND rev_timestamp = (SELECT MAX(rev_timestamp)
FROM revision AS last
WHERE last.rev_page = pagetmp.page_id);
Explanation: This pulls uncategorized categories.
Pages in a specific category using a specific template
SELECT
page_title
FROM page
JOIN templatelinks
ON tl_from = page_id
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = "Living_people"
AND tl_namespace = 10
AND tl_title = "Fact"
AND page_namespace = 0
LIMIT 500;
Explanation: This pulls pages that are in a specific category and are using a specific template.
Pages in a specific category linking to a specific page
SELECT
page_title
FROM page
JOIN pagelinks
ON pl_from = page_id
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = "Mass_Rapid_Transit_(Singapore)_stations"
AND pl_namespace = 0
AND pl_title = "High-volume_low-speed_fan"
AND page_namespace = 0
LIMIT 500;
Explanation: This pulls pages that are in a specific category that link to a specific page.
Top edit timestamp for a category of users
SELECT
rev_user_text,
rev_timestamp
FROM revision
JOIN (SELECT
page_title
FROM page
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = 'Category_name_goes_here'
AND page_namespace = 2
AND page_title NOT LIKE '%/%') AS cltmp
ON REPLACE(cltmp.page_title, '_', ' ') = rev_user_text
WHERE rev_timestamp = (SELECT
MAX(rev_timestamp)
FROM revision
WHERE rev_user_text = REPLACE(cltmp.page_title, '_', ' '));
Explanation: This will take the user pages that do not contain a forward slash ("/") in "Category_name_goes_here" and get the top edit timestamp for each user.
Top pages by in a specific namespace for a specific user
SELECT
`page_title`,
COUNT(*)
FROM `revision`
JOIN `page`
ON `page_id` = `rev_page`
JOIN `user`
ON `user_id` = `rev_user`
WHERE `user_name` = 'User name'
AND `page_namespace` = 4
GROUP BY `page_title`
ORDER BY COUNT(*) DESC
LIMIT 25;
Explanation: This will pull the page titles of a the most-edited pages by a specific user in a specific namespace.
Number of deletions per day for a specific user
SELECT
DATE_FORMAT(log_timestamp, "%Y-%m-%d") AS day,
COUNT(log_timestamp) AS deletions
FROM logging_userindex
JOIN actor ON log_actor = actor_id
WHERE actor_name = 'User name'
AND log_type = 'delete' AND log_action = 'delete'
GROUP BY day DESC;
Explanation: This retrieves the number of deletions per day by a specific user.
Number of deletions per day
SELECT
DATE_FORMAT(log_timestamp, "%Y-%m-%d") AS day,
COUNT(log_id) AS deletions
FROM logging_userindex
WHERE log_type = 'delete'
AND log_action = 'delete'
GROUP BY day;
Explanation: This will pull the number of deletions per day on a given wiki. Works on Toolforge
Most common deletion summaries
SELECT
comment_text,
COUNT(log_id) AS uses
FROM logging
JOIN `comment` ON log_comment_id = comment_id
WHERE log_type = 'delete'
AND log_action = 'delete'
GROUP BY comment_text
ORDER BY uses DESC
Explanation: This will pull the most common deletion summaries on a given wiki.
Most common deletion summaries for a specific user
SELECT
comment_text,
COUNT(*) AS uses
FROM logging_userindex
JOIN `comment` ON log_comment_id = comment_id
JOIN `actor` ON log_actor = actor_id
WHERE actor_name = 'User name'
AND log_type = 'delete'
AND log_action = 'delete'
GROUP BY comment_text
ORDER BY uses DESC
LIMIT 25;
Explanation: This will pull the most commonly used deletion summaries for a specific user. Works on Toolforge
Most common edit summaries for a specific user
SELECT
comment_text,
COUNT(*) AS uses
FROM revision_userindex
JOIN `comment` on rev_comment_id = comment_id
JOIN actor ON rev_actor = actor_id
WHERE actor_name = 'User name'
GROUP BY comment_text
ORDER BY uses DESC
LIMIT 25;
Explanation: This will pull the most commonly used edit summaries for a specific user. Works on Toolforge
Number of revisions per day
SELECT
DATE_FORMAT(rev_timestamp, "%Y-%m-%d") AS day,
COUNT(rev_timestamp) AS revisions
FROM revision
GROUP BY day;
Explanation: This will pull the number of (non-deleted) revisions to a particular wiki and group the numbers by day. Works on Toolforge
Number of revisions per day by a specific user
SELECT
DATE_FORMAT(rev_timestamp, "%Y-%m-%d") AS day,
COUNT(rev_timestamp) AS revisions
FROM revision_userindex
JOIN actor on rev_actor = actor_id
WHERE actor_name = 'User name'
GROUP BY day;
Explanation: This will pull the number of (non-deleted) revisions by a specific user per day. Works on Toolforge
Most common templates in a category
SELECT
tl.tl_title,
COUNT(*) usages
FROM page p
JOIN categorylinks cl
ON p.page_namespace = 6 -- Optional namespace check.
AND cl.cl_from = p.page_id
AND cl.cl_to = "Files_with_no_machine-readable_license" -- Insert your category here.
JOIN templatelinks tl
ON tl.tl_from = p.page_id
WHERE tl.tl_namespace = 10
GROUP BY tl.tl_title
ORDER BY usages DESC
LIMIT 50;
Explanation: This example selects all templates (in namespace 10) used by pages in a certain namespace (6) and category, groups the results by template name and sorts the template names by frequency.
File description pages without an associated file
SELECT
page_title
FROM page
WHERE NOT EXISTS (SELECT
img_name
FROM image
WHERE img_name = page_title)
AND NOT EXISTS (SELECT
img_name
FROM commonswiki_f_p.image
WHERE img_name = page_title)
AND page_namespace = 6
AND page_is_redirect = 0
LIMIT 1000;
Explanation: This will pull file description pages that do not have an associated file, either locally or in the Commons repo. Works on Toolforge
Files without an associated file description page
SELECT
img_name
FROM image
WHERE NOT EXISTS (SELECT
page_title
FROM page
WHERE img_name = page_title
AND page_namespace = 6)
AND NOT EXISTS (SELECT
page_title
FROM commonswiki_p.page
WHERE img_name = page_title
AND page_namespace = 6);
Explanation: This will pull files that have no associated file description page, either locally or in the Commons repo.
File description pages containing no templates
SELECT
ns_name,
page_title,
page_len
FROM page
JOIN toolserver.namespace
ON dbname = 'enwiki_p'
AND ns_id = page_namespace
LEFT JOIN templatelinks
ON tl_from = page_id
WHERE NOT EXISTS (SELECT
img_name
FROM commonswiki_p.image
WHERE img_name = page_title)
AND page_namespace = 6
AND page_is_redirect = 0
AND tl_from IS NULL
LIMIT 800;
Explanation: This will pull file description pages containing no templates that do not have an associated file description page in the Commons repo.
File description pages containing no templates or categories
SELECT
ns_name,
page_title,
page_len
FROM page
JOIN toolserver.namespace
ON dbname = 'enwiki_p'
AND ns_id = page_namespace
LEFT JOIN templatelinks
ON tl_from = page_id
LEFT JOIN categorylinks
ON cl_from = page_id
WHERE NOT EXISTS (SELECT
img_name
FROM commonswiki_p.image
WHERE img_name = page_title)
AND page_namespace = 6
AND page_is_redirect = 0
AND tl_from IS NULL
AND cl_from IS NULL
LIMIT 800;
Explanation: This will pull file description pages containing no templates or categories that do not have an associated file description page in the Commons repo.
Links on a particular page
SELECT
pl_namespace,
pl_title
FROM page
JOIN pagelinks
ON pl_from = page_id
WHERE page_namespace = 0
AND page_title = 'Don\'t_poke_the_bear';
Explanation: This will pull the names of the links on a particular page. For example, the text of the "Don't poke the bear" page contains the link "[[bear]]" so the output of this query will list "Bear" as a result.
Links to a particular page
SELECT
page_namespace,
page_title
FROM page
JOIN pagelinks
ON pl_from = page_id
WHERE pl_namespace = 0
AND pl_title = 'Don\'t_poke_the_bear';
Explanation: This will pull the names of the links to a particular page. This is the equivalent of Special:WhatLinksHere. For example, the page "What not to do" may contain the link "[[Don't poke the bear]]"; this query would output "What not to do" as a result.
Pages containing 0 page links
SELECT
page_namespace,
page_title
FROM page
LEFT JOIN pagelinks
ON pl_from = page_id
WHERE pl_namespace IS NULL
LIMIT 1000;
Explanation: This will pull pages that contain 0 page links. This will not account for things like image links, template links, category links, or external links. Works on Toolforge
Pages with 0 links to them
SELECT
page_namespace,
page_title
FROM page
LEFT JOIN pagelinks
ON pl_title = page_title
AND pl_namespace = page_namespace
WHERE pl_namespace IS NULL
LIMIT 1;
Explanation: This will pull pages that have 0 links to them (Special:WhatLinksHere would be empty!).
Less linked to pages
SELECT
page_namespace,
page_title,
COUNT(*) AS count
FROM page
JOIN pagelinks ON (pl_namespace=page_namespace AND page_title=pl_title)
WHERE page_namespace=0
GROUP BY page_id
ORDER BY count ASC
LIMIT 100;
Explanation: This will pull the first 100 pages that -although linked to- have the less links to them.
Short pages with a single author (excluding user pages and redirects)
SELECT
CONCAT(ns_name, ':', page_title),
page_len
FROM page
JOIN toolserver.namespace
ON page_namespace = ns_id
AND dbname = 'enwiki_p'
LEFT JOIN templatelinks
ON tl_from = page_id
WHERE page_len < 50
AND page_is_redirect = 0
AND page_namespace NOT IN (2, 3)
AND tl_from IS NULL
AND (SELECT
COUNT(DISTINCT rev_user_text)
FROM revision
WHERE rev_page = page_id) = 1
ORDER BY page_len ASC
Explanation: This will list short pages with a single author (excluding user pages and redirects).
Unused templates
SELECT
CONCAT('Template:', page_title)
FROM page
LEFT JOIN templatelinks
ON page_namespace = tl_namespace
AND page_title = tl_title
WHERE page_namespace = 10
AND tl_from IS NULL
Explanation: This will pull unused templates.
Unused files in a category
SELECT CONCAT("\n|-\n| [[:File:", img_name, "]] ||", img_user_text, "\n") as UnusedCopyvios
FROM image
JOIN page
ON page_namespace = 6
AND page_title = img_name
JOIN categorylinks
ON cl_from = page_id
AND cl_type = "file"
AND cl_to = "Files_with_no_machine-readable_license"
LEFT JOIN imagelinks
ON il_to = img_name
WHERE il_to IS NULL
GROUP BY img_name;
This query selects all files in the wiki, then joins further tables to determine which are unused by any article and are contained in a specific category, in this case "Files_with_no_machine-readable_license". Other tracking categories can be found at Special:TrackingCategories.
Notice the usage of LEFT JOIN + IS NULL to determine which rows are present in the first table but missing in the second. This is still fast because we first used JOIN to drastically reduce the number of rows involved: categorylinks and imagelinks easily have millions of rows even on a wiki with few thousands pages and files.
Files with a category but not another
Similar to the above, but with multiple category filters: uses multiple LEFT JOIN + IS NULL on the same table. Here the output is formatted for <gallery></gallery> tags.
SELECT CONCAT("File:", img_name, "| ", img_user_text) as UnusedCopyvios
FROM image
JOIN page
ON page_namespace = 6
AND page_title = img_name
JOIN categorylinks
ON cl_from = page_id
AND cl_type = 'file'
AND cl_to = "Files_with_no_machine-readable_author"
LEFT JOIN categorylinks c
ON c.cl_from = page_id
AND c.cl_to = "Files_with_no_machine-readable_source"
WHERE c.cl_to IS NULL
GROUP BY img_name;
Broken image links
SELECT
CONCAT("* [[", if (page_namespace = 0, page_title, concat(":", ns_name, ":", page_title)), "]]",
" - [[:Bilde:", il_to, "]]") as links
FROM imagelinks
JOIN page
ON page_id = il_from
JOIN toolserver.namespace
ON ns_id = page_namespace
AND dbname = "nowiki_p"
LEFT JOIN image as I
ON I.img_name = il_to
LEFT JOIN commonswiki_p.image as J
ON J.img_name = il_to
WHERE I.img_name IS NULL
AND J.img_name IS NULL;
Explanation: List references to images that are not present locally nor on Commons (nowiki).
Revision counting for specific user till specific time
SELECT
COUNT(*)
FROM revision
WHERE rev_user_text = 'User name'
AND rev_timestamp < '20120327000000';
Explanation: Return the number of revisions by a specific user up to a certain time.
Log count in recent time by users not in group
A query like this outputs the users with:
- the most log actions of the kind specified (in the example, patrol/patrol minus autopatrolled edits),
- in the interval specified (in the example, last 24 months),
- excluding users in some user groups (in the example, sysop and rollbacker).
/* SLOW_OK adapted from Krinkle, bug 25799 */
SELECT
count(*) AS counter,
user_name,
user_editcount
FROM logging log
JOIN user us
ON log.log_user = us.user_id
JOIN user_groups ug
ON log.log_user = ug.ug_user
WHERE ug.ug_user NOT IN
(SELECT DISTINCT user_groups.ug_user
FROM user_groups
WHERE ug_group = 'sysop'
OR ug_group = 'rollbacker'
)
AND log.log_type = 'patrol'
AND log.log_action = 'patrol'
AND log.log_timestamp > ( NOW() - INTERVAL 24 MONTH )
AND ( log.log_params LIKE '%"6::auto";i:0%'
OR log.log_params LIKE '%\n0' /* not autopatrolled */ )
GROUP BY log.log_user
ORDER BY counter DESC
LIMIT 50;
In wikis where "patrol" right is given to all users, the example query can be used to find active patrollers who would be likely candidates for additional rights (like autropatrolled, rollbacker, sysop etc. depending on the wiki's configuration).
Redirects with more than one revision
SELECT
page_title,
rd_namespace,
rd_title,
COUNT(*) as edits
FROM revision
INNER JOIN page
ON rev_page = page_id
INNER JOIN redirect
ON rev_page = rd_from
WHERE page_namespace = 0
AND page_is_redirect = 1
GROUP BY page_title
HAVING COUNT(*) > 1;
Explanation: This query lists redirect pages with more than one revision.
Short non-disambiguation pages
SELECT
page_title
FROM page
WHERE page_namespace = 0
AND page_is_redirect = 0
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_title = 'Disambiguation'
AND tl_namespace = 10)
ORDER BY page_len ASC
LIMIT 1000;
Explanation: This query lists 1000 non-disambiguation short pages.
List of external links
SELECT
COUNT(*),
TRIM(LEADING 'www.' FROM SUBSTRING_INDEX(SUBSTRING_INDEX(el_to, '/', 3),'/',-1)) AS site
FROM externallinks, page
WHERE el_from = page_id
AND page_namespace = 0
GROUP BY 2
HAVING COUNT(*) > 10
ORDER BY 1;
Explanation: This query compiles a list of external links (from the main namespace) grouped by website.
List of external links from all wikis
List of external links to a given domain from all namespaces on all Wikimedia projects; requires iterating over wikis, with a bash script in the example, syntax like ./extlinks.sh http://meta.wikimedia.org
(outputs DB name, page title, linked URL).
#!/bin/bash
WIKIS=$(mysql -BN -h sql -e 'SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND is_closed = 0;')
for dbname in $WIKIS;
do
echo $dbname
echo "
/* SLOW_OK */
SELECT CONCAT(ns_name, ':', page_title), el_to
FROM externallinks
JOIN page ON el_from = page_id
JOIN toolserver.namespacename ON dbname = '$dbname' AND ns_is_favorite AND ns_id = page_namespace
WHERE el_to LIKE '$1%'
-- " | mysql -h ${dbname/_/-}.rrdb.toolserver.org -BcN $dbname
# mysql switches
# -c Prevent comment stripping, need to prevent the query killer
# -N No column name heading
# -B Bare formatted / -t Table formatted / -H HTML output / -X XML output
done;
List of interwiki links from all wikis
As above; use lowercase interwiki prefix, like ./iwlinks.sh meatball
.
#!/bin/bash
WIKIS=$(mysql -BN -h sql -e 'SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND is_closed = 0;')
for dbname in $WIKIS;
do
echo $dbname
echo "
/* SLOW_OK */
SELECT CONCAT(ns_name, ':', page_title), CONCAT('$1:', iwl_title)
FROM iwlinks
JOIN page ON iwl_from = page_id
JOIN toolserver.namespacename ON dbname = '$dbname' AND ns_is_favorite AND ns_id = page_namespace
WHERE iwl_prefix = '$1'
-- " | mysql -h ${dbname/_/-}.rrdb.toolserver.org -BcN $dbname
# mysql switches
# -c Prevent comment stripping, need to prevent the query killer
# -N No column name heading
# -B Bare formatted / -t Table formatted / -H HTML output / -X XML output
done;
User activity
Activity of registered users, as edits per day. This query filters out flagged bots but might still miss though.
select
rev_user,
count(*) / ( unix_timestamp(max(rev_timestamp)) - unix_timestamp(min(rev_timestamp)) ) * 86400.0 as edits_per_day
from revision left join user_groups
on rev_user = ug_user
where (user_id is null or ug_group <> 'bot') and rev_user > 0
group by rev_user;
Or with some more flexibility to filter users not in group X:
SELECT
rev_user_text,
count(*) / ( unix_timestamp(max(rev_timestamp)) - unix_timestamp(min(rev_timestamp)) ) * 86400.0 as edits_per_day
FROM revision
LEFT JOIN user_groups
ON rev_user = ug_user
WHERE ug_user NOT IN
(SELECT DISTINCT user_groups.ug_user
FROM user_groups
WHERE ug_group = 'bot'
OR ug_group = 'flowbot'
)
GROUP BY rev_user
ORDER BY edits_per_day DESC
LIMIT 1000;
Pages with revision count and editor count and sum of length of all revisions
This query works on the WMF Replica db
SELECT
rev_page AS page_id,
count(*) AS revisions,
count(distinct rev_actor) AS editors,
sum(rev_len) AS size
FROM revision
GROUP BY rev_page;
Top 10 articles by sum of length of all revisions
This query works on the WMF Slave db
SELECT
rev_page,
count(distinct actor_name),
sum(rev_len) as length,
page_title as title
FROM revision_userindex
INNER JOIN page
ON page_id = rev_page
INNER JOIN actor_revision
ON rev_actor = actor_id
GROUP BY rev_page
ORDER BY length DESC
LIMIT 10;
Top links, red and blue
This query works on the WMF Slave db
SELECT
pl_title,
count(*),
page_id
FROM enwiki.pagelinks
LEFT JOIN enwiki.page ON
pl_title = page_title
WHERE
pl_namespace = 0 AND
page_namespace = 0
GROUP BY 1
First messages to users, by tool/bot used (if any)
SELECT
page_id, page_title, rev_comment,rev_user, rev_user_text,
cast(rev_timestamp as datetime) as rev_timestamp, cast(user_registration as datetime) as user_registration,
datediff(rev_timestamp,user_registration) as datediff_msg_reg, timediff(rev_timestamp,user_registration) as timediff_msg_reg,
DATE_FORMAT(user_registration,"%Y-%m-%d") as user_reg_ymd, DATE_FORMAT(rev_timestamp,"%Y-%m-%d") as first_msg_ymd, user_id,
(rev_comment LIKE '%WP:HG%' OR rev_comment LIKE '%WP:HUGGLE%') as huggle, (rev_comment LIKE '%WP:TW%') tw,
(rev_comment LIKE '%igloo|GLOO]]%') as gloo,
(rev_comment LIKE '%WP:STiki%') as stiki,
(rev_user_text LIKE 'ClueBot%') as cluebot,
(rev_user_text = 'XLinkBot') as xlinkbot,
(rev_user_text = 'CorenSearchBot') as corenbot,
(rev_user_text LIKE 'BetacommandBot%') as betacommandbot,
(rev_user_text LIKE '%Bot') as bot,
(rev_user_text LIKE page_title) as self
FROM enwiki.page
LEFT JOIN enwiki.revision ON page_id = rev_page
LEFT JOIN enwiki.user ON replace(page_title,"_"," ") = user_name
WHERE page_namespace = 3 AND LOCATE('/',page_title) = 0
GROUP BY page_id;
Number of registrations per day
select DATE_FORMAT(log_timestamp,"%Y-%m-%d"), count(*)
from enwiki.logging
where log_action = 'create' AND log_type='newusers'
group by 1
order by log_timestamp;
Automated tool and bot edits
UPDATE rev_table SET tool = 'rollback' WHERE rev_comment RLIKE "(Reverted ([0-9]+ )?edits by \[\[Special:Contributions/[^\|]+\|[^]]+\]\] \(\[\[User talk:[^\|]+\|talk\]\]\) to last version by .+)";
UPDATE rev_table SET tool = 'undo' WHERE rc_comment LIKE "Undid revision%";
UPDATE rev_table SET tool = 'huggle' WHERE rc_comment LIKE '%WP:HG%' OR rc_comment LIKE '%WP:HUGGLE%';
UPDATE rev_table SET tool = 'huggle' WHERE rc_comment RLIKE "(Message re\. \[\[[^]]+\]\])|(Level [0-9]+ warning re\. \[\[[^]]+\]\])";
UPDATE rev_table SET tool = 'twinkle' WHERE rc_comment LIKE '%WP:TW%';
UPDATE rev_table SET tool = 'friendly' WHERE rc_comment LIKE '%WP:FRIENDLY%' OR rc_comment LIKE '%WP:Friendly%';
UPDATE rev_table SET tool = 'vandalproof' WHERE rc_comment LIKE '%WP:VPRF%' OR rc_comment LIKE '%WP:VandalProof%' OR rc_comment LIKE '%VandalProof|VandalProof%' OR rc_comment LIKE '%WP:VP2%' OR rc_comment LIKE '%WP:VandalProof%';
UPDATE rev_table SET tool = 'stiki' WHERE rc_comment LIKE '%|STiki]]%';
UPDATE rev_table SET tool = 'npwatcher' WHERE rc_comment LIKE '%|NPWatcher%';
UPDATE rev_table SET tool = 'vandalsniper' WHERE rc_comment LIKE '%|VandalSniper%';
UPDATE rev_table SET tool = 'wikimonitor' WHERE rc_comment LIKE '%m:WikiMonitor%';
UPDATE rev_table SET tool = 'mwt' WHERE rc_comment LIKE '%MWT|MWT]]%';
UPDATE rev_table SET tool = 'awb' WHERE rc_comment LIKE '%AWB|AWB]]%' OR rc_comment LIKE '%AutoWikiBrowser%';
UPDATE rev_table SET tool = 'cluebot' WHERE lower(cast(rc_user_text as CHAR)) LIKE 'cluebot%';
UPDATE rev_table SET tool = 'antivandalbot' WHERE rc_user_text = 'AntiVandalBot';
UPDATE rev_table SET tool = 'orphanbot' WHERE rc_user_text = 'OrphanBot';
UPDATE rev_table SET tool = 'pseudobot' WHERE rc_user_text = 'PseudoBot';
UPDATE rev_table SET tool = 'voabot' WHERE rc_user_text LIKE 'VoABot%';
UPDATE rev_table SET tool = 'martinbot' WHERE rc_user_text = 'MartinBot';
UPDATE rev_table SET tool = 'stbot' WHERE rc_user_text LIKE 'STBot%';
UPDATE rev_table SET tool = 'sqlbot' WHERE rc_user_text = 'SQLbot';
UPDATE rev_table SET tool = 'tawkerbot' WHERE rc_user_text LIKE 'Tawkerbot%';
UPDATE rev_table SET tool = 'sinebot' WHERE rc_user_text = 'SineBot';
UPDATE rev_table SET tool = 'csdwarnbot' WHERE rc_user_text = 'CSDWarnBot';
UPDATE rev_table SET tool = 'antispambot' WHERE rc_user_text = 'AntiSpamBot';
UPDATE rev_table SET tool = 'coibot' WHERE rc_user_text = 'COIBot';
UPDATE rev_table SET tool = 'corensearchbot' WHERE rc_user_text = 'CorenSearchBot';
UPDATE rev_table SET tool = 'anomiebot' WHERE rc_user_text = 'AnomieBOT';
UPDATE rev_table SET tool = 'smackbot' WHERE rc_user_text = 'SmackBot';
UPDATE rev_table SET tool = 'xlinkbot' WHERE rc_user_text = 'XLinkBot';
UPDATE rev_table SET tool = 'yobot' WHERE rc_user_text = 'YoBot';
UPDATE rev_table SET tool = 'hostbot' WHERE rc_user_text = 'HostBot';
UPDATE rev_table SET tool = 'mw_message' WHERE rc_user_text = 'MediaWiki message delivery';
UPDATE rev_table SET tool = 'sigmabot' WHERE rc_user_text = 'Lowercase sigmabot III';
UPDATE rev_table SET tool = 'hasteurbot' WHERE rc_user_text = 'HasteurBot';
UPDATE rev_table SET tool = 'legobot' WHERE rc_user_text = 'Legobot';
UPDATE rev_table SET tool = 'dplbot' WHERE rc_user_text = 'DPL bot';
UPDATE rev_table SET tool = 'b-bot' WHERE rc_user_text = 'B-bot';
UPDATE rev_table SET tool = 'suggestbot' WHERE rc_user_text = 'SuggestBot';
UPDATE rev_table SET tool = 'referencebot' WHERE rc_user_text = 'ReferenceBot';
UPDATE rev_table SET tool = 'aaalertbot' WHERE rc_user_text = 'AAlertBot';
UPDATE rev_table SET tool = 'wpcleaner' WHERE rc_comment LIKE "%WP:CLEANER%";
UPDATE rev_table SET tool = 'phantombot' WHERE rc_user_text = 'ThePhantomBot';
UPDATE rev_table SET tool = 'veblenbot' WHERE rc_user_text = 'VeblenBot';
UPDATE rev_table SET tool = 'mediationbot' WHERE rc_user_text = 'MediationBot';
UPDATE rev_table SET tool = 'adminstatsbot' WHERE rc_user_text = 'AdminStatsBot';
UPDATE rev_table SET tool = 'listeriabot' WHERE rc_user_text = 'ListeriaBot';
UPDATE rev_table SET tool = 'musikbot' WHERE rc_user_text = 'MusikBot';
UPDATE rev_table SET tool = 'aaalertbot' WHERE rc_user_text = 'AAlertBot';
UPDATE rev_table SET tool = 'awb-suspected' WHERE (rc_comment LIKE "% awb %" OR rc_comment LIKE "% AWB %") AND tool is NULL;
UPDATE rev_table SET tool = 'awb' WHERE rc_comment LIKE "% via awb %";
UPDATE rev_table SET tool = 'wikilove' WHERE rc_comment LIKE "%new WikiLove message%";
UPDATE rev_table SET tool = 'afch' WHERE rc_comment LIKE '%WP:AFCH%';
UPDATE rev_table SET tool = 'spamublock' WHERE rc_comment LIKE '%spamublock.js%';
UPDATE rev_table SET tool = 'spamuserpage' WHERE rc_comment LIKE '%WP:SUPG|SUPG%';
UPDATE rev_table SET tool = 'csdh' WHERE rc_comment LIKE '%Scripts|CSDH%';
UPDATE rev_table SET tool = 'cat-a-lot' WHERE rc_comment LIKE '%[[Help:Cat-a-lot|Cat-a-lot]]%';
UPDATE rev_table SET tool = 'autoed' WHERE rc_comment LIKE '%|AutoEd]]%';
UPDATE rev_table SET tool = 'refill' WHERE rc_comment LIKE '%[[:en:WP:REFILL|reFill]]%';
UPDATE rev_table SET tool = 'hotcat' WHERE rc_comment LIKE '%WP:HC|HotCat%';
UPDATE rev_table SET tool = 'mosnumscript' WHERE rc_comment LIKE '%WP:MOSNUMscript%';
UPDATE rev_table SET tool = 'reflinks' WHERE rc_comment LIKE '%WP:REFLINKS|Reflinks]%';
UPDATE rev_table SET tool = 'cenpop' WHERE rc_comment LIKE '%via CenPop%';
UPDATE rev_table SET tool = 'ohconfucius' WHERE rc_comment LIKE '%User:Ohconfucius/script|Script%';
UPDATE rev_table SET tool = 'gregu' WHERE rc_comment LIKE '%User:GregU/dashes.js|script%';
UPDATE rev_table SET tool = 'wpadventure' WHERE rc_comment LIKE '%simulated automatically as part of [[WP:The Wikipedia Adventure%';
Determining Huggle and Twinkle messages types by revision comment
UPDATE rev_table SET tool = 'huggle' WHERE rev_comment LIKE '%WP:HG%' OR rev_comment LIKE '%WP:HUGGLE%';
UPDATE rev_table SET tool = 'twinkle' WHERE rev_comment LIKE '%WP:TW%';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level1' WHERE rev_comment LIKE 'Message re.%' AND tool = 'huggle';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level2' WHERE rev_comment LIKE 'Level 2 warning%' AND tool = 'huggle';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level3' WHERE rev_comment LIKE 'Level 3 warning%' AND tool = 'huggle';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level4' WHERE rev_comment LIKE 'Level 4 warning%' AND tool = 'huggle';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level1' WHERE rev_comment LIKE 'General note:%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level2' WHERE rev_comment LIKE 'Caution:%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level3' WHERE rev_comment LIKE 'Warning:%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level4' WHERE rev_comment LIKE 'Final warning:%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'csd' WHERE rev_comment LIKE 'Notification: speedy deletion%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'tag' WHERE rev_comment LIKE 'Notification: tagging for deletion%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'prop' WHERE rev_comment LIKE 'Notification: proposed deletion%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'listing' WHERE rev_comment LIKE 'Notification: listing at%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'talkback' WHERE rev_comment LIKE 'Talkback%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'blocked' WHERE rev_comment LIKE 'You have been%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'blocked' WHERE rev_comment LIKE 'Your IP address has been blocked%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'welcome' WHERE rev_comment LIKE 'Added welcome%' AND tool = 'twinkle';
UPDATE rev_table SET msg_type = 'welcome' WHERE rev_comment LIKE 'Welcoming%' AND tool = 'huggle';
UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'csd' WHERE rev_comment LIKE 'Notification: Speedy deletion%' AND tool = 'huggle';
UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'prop' WHERE rev_comment LIKE 'Notification: Proposed deletion%' AND tool = 'huggle';
UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'blocked' WHERE rev_comment LIKE 'Notification: Blocked%' AND tool = 'huggle';
Deletion count by day (Mainspace)
SELECT
DATE( log_timestamp ) AS Date,
COUNT( log_id ) AS deletions
FROM logging
WHERE log_action='delete'
AND log_namespace =0
GROUP BY Date( log_timestamp )
List of users with a certain log action (not) among global users
For instance, a query to have a list of autocreated accounts (which have an entry in autocreate log) which are not known by CentralAuth (bugzilla:61876):
SELECT
log_title,
log_user,
log_timestamp
FROM logging
JOIN user ON logging.log_action = 'autocreate'
AND log_timestamp > 20130831000000
AND logging.log_user = user.user_id /* user_name and lu_name are normalised, log_* contains e.g. underscores */
WHERE user.user_name NOT IN (
SELECT lu_name
FROM centralauth_p.localuser lu
WHERE lu.lu_wiki = 'metawiki'
);
Editors of a page by edit count
Something like this can be used for instance to quickly list users who (don't) meet requirements for voting in a poll, though usually requirements are about more than edit count (cf. accounteligibility tool for complex examples).
SELECT DISTINCT
rev_user_text,
user_editcount
FROM revision
LEFT JOIN user /* This will leave rows for unregistered users too */
ON revision.rev_user = user.user_id
WHERE revision.rev_page = 1000 /* Insert page ID here, e.g. from action=info */
ORDER BY user_editcount DESC;
Frequency of a log action by day
Replace "create" (unregistered user signup) with your log action; and "8) AS DAY" with "6) AS MONTH" and so on for lower resolution.
SELECT
DAY,
ACTIONS
FROM (SELECT
SUBSTR(log_timestamp, 1, 8) AS DAY,
SUM(log_action = 'create') AS ACTIONS
FROM logging
GROUP BY DAY
ORDER BY log_id DESC
LIMIT 30)
AS logs;
Number of edits prevented by AbuseFilter
SELECT
SUBSTR(afl_timestamp, 1, 6) AS MONTH,
COUNT(afl_id) AS edits
FROM abuse_filter_log
WHERE afl_action = 'edit'
AND afl_actions LIKE '%disallow%'
GROUP BY MONTH;
Number of edits hindered by AbuseFilter
SELECT
SUBSTR(afl_timestamp, 1, 6) AS MONTH,
COUNT(afl_id) AS edits
FROM abuse_filter_log
WHERE afl_action = 'edit'
AND afl_actions RLIKE '.*(throttle|warn|disallow).*'
GROUP BY MONTH;
Number of users hit by AbuseFilter
SELECT
MONTH,
USERS
FROM (SELECT
SUBSTR(afl_timestamp, 1, 6) AS MONTH,
COUNT(DISTINCT(afl_user_text)) AS USERS
FROM abuse_filter_log
WHERE afl_actions RLIKE '.*(throttle|warn|disallow|blockautopromote|block|degroup).*'
GROUP BY MONTH
ORDER BY afl_id ASC)
AS logs;
Edits per month with given tag and editcount
In the example, VisualEditor edits by users with a (current!) edit count of at least 1000 edits.
SELECT
SUBSTR(rev_timestamp, 1, 6) AS MONTH,
COUNT(rev_id) AS edits
FROM revision
JOIN change_tag ON ct_rev_id = rev_id
JOIN user ON rev_user = user_id
WHERE ct_tag = 'visualeditor'
AND user_editcount > 1000
GROUP BY MONTH;
Edits per month with two given tags
In the example, VisualEditor edits tagged "nowiki" (by AbuseFilter).
SELECT
SUBSTR(rev_timestamp, 1, 6) AS MONTH,
COUNT(rev_id) AS edits
FROM change_tag as a
JOIN change_tag as b ON a.ct_rev_id = b.ct_rev_id
JOIN revision ON rev_id = a.ct_rev_id
WHERE a.ct_tag = 'nowiki'
AND b.ct_tag = 'visualeditor'
GROUP BY MONTH;
Survived editors
Editor retention by month considering time between first and last edit, excluding bots, ordered by month of first edit.
SELECT
CONCAT(month, '/', year) monthly,
SUM(period < 7) less_than_1week,
SUM(period BETWEEN 7 AND 30) 1week_to_1month,
SUM(period BETWEEN 31 AND 365) 1month_to_1year,
SUM(period > 365) more_than_1year
FROM (
SELECT
SUBSTR(first, 5, 2) month,
SUBSTR(first, 1, 4) year,
TIMESTAMPDIFF(DAY, first, last) period
FROM (
SELECT
MIN(rev_timestamp) first,
MAX(rev_timestamp) last
FROM revision_userindex
WHERE rev_user != 0 AND rev_user NOT IN (SELECT ug_user FROM user_groups WHERE ug_group = 'bot')
GROUP BY rev_user
) a
) b
GROUP BY monthly
ORDER BY year, month;
Top most active editors under some conditions
For instance the following gives the top 50 most active translators (editors of the "Translations" namespace, where available). A simple JOIN should give most of the filtering options you'll need.
SELECT
rev_user_text, COUNT(*)
FROM revision
JOIN page
ON rev_page = page_id
WHERE page_namespace = 1198
/* Other filters like page suffix and year
AND page_title LIKE '%/it'
AND rev_timestamp LIKE '2014%' */
GROUP BY rev_user_text
ORDER BY COUNT(*) DESC
LIMIT 50;
Or, if you want to count log actions and filter e.g. by current user group, in the example number of deletions by sysops:
SELECT
log_user_text, COUNT(*)
FROM logging
JOIN user_groups ON ug_user = log_user
WHERE log_type = 'delete'/*
AND log_action = 'delete'
AND log_timestamp > 20130101000000*/
AND ug_group = 'sysop'
GROUP BY log_user_text
ORDER BY COUNT(*) DESC
LIMIT 50;
Most edited pages in last month
SELECT rc_title, count(*) as num_edits
FROM recentchanges
WHERE rc_namespace = 0
GROUP BY 1 ORDER BY 2 DESC
LIMIT 25;
Regular expressions
Is an IPv4 address
This query matches 999.999.999.999 as well as valid addresses like 129.47.61.5, but not IPv6 addresses.
REGEXP '^[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}$'
Is an IPv4 or IPv6 address
This realistically matches only IP addresses, but doesn't check the 8th IPv6 group. Note that, for a brief time, MediaWiki allowed users to register with IP-looking usernames.
REGEXP '(^([0-9]{1,3}[.]){3}[0-9]{1,3}$|^([[:alnum:]]{1,4}[:]){7})'
Number of Properties creations per creator
USE wikidatawiki_p;
SELECT
a.actor_name,
COUNT(*) as count
FROM
page p,
revision rf,
actor_user a
WHERE
rf.rev_page = p.page_id
AND rf.rev_parent_id = 0
AND a.actor_id = rf.rev_actor
AND p.page_namespace = 120 -- Property namespace
GROUP BY
a.actor_name
ORDER BY
count desc ;
Explanation: On wikidatawiki, number of Properties (P#####) creations per creator (Admin + Property creator) since the existence of Wikidata. The results do not include properties that have been deleted. Some results cannot be consulted directly from Wikidata due to the length of the histories.