How to Crawl Your Website Automatically Using GitHub Actions
This document introduces how to configure Algolia DocSearch for your Docusaurus website and crawl your website automatically using GitHub Actions.
Step 1: Apply for Algolia DocSearch
Algolia DocSearch provides a free service for open-source projects. If your website is open-source, you can apply for Algolia DocSearch by filling out the form. For more details, refer to Who can apply for DocSearch?.
After applying, you will receive an invitation email from Algolia DocSearch. After accepting the invitation, you can manage your website crawler in the Algolia Crawler and access your data in the Algolia Dashboard.
Step 2: Configure Algolia in Docusaurus
In the docusaurus.config.js
file, add the following configuration:
/** @type {import('@docusaurus/types').Config} */
const config = {
// ...
themeConfig:
/** @type {import('@docusaurus/preset-classic').ThemeConfig} */
({
// ...
algolia: {
appId: 'ALGOLIA_APPLICATION_ID',
apiKey: 'ALGOLIA_SEARCH_API_KEY',
indexName: 'ALGOLIA_INDEX_NAME',
},
}),
};
module.exports = config;
},
To get appId
and apiKey
, follow the steps below:
-
Log in to the Algolia Dashboard and navigate to the API Keys page.
-
On the API Keys page, select your Application and you can see the Application ID (
appId
) and Search API Key (apiKey
).
The apiKey
is a public key and you can commit it to your repository.
To get the indexName
, follow the steps below:
-
Log in to the Algolia Dashboard.
-
In the left navigation pane, click Data sources and then click Indices.
-
On the Indices page, select your Application and you can see all indices of this application. The Index field corresponds to
indexName
.
For more details, refer to Docusaurus: connecting Algolia and DocSearch: API reference.
Step 3: Crawl your website automatically
How often will DocSearch crawl my website?
Crawls are scheduled at a random time once a week. You can configure this schedule from the config file or trigger one manually from the Crawler interface.
If you want to trigger Algolia Crawler every time you update your website, you can use the algoliasearch-crawler-github-actions
.
-
Get the Crawler User ID, Crawler API Key, Application ID, and API Key of your Algolia account. Application ID and API Key are the same as the
appId
andapiKey
in the previous step. To get Crawler User ID and Crawler API Key, follow the steps below:- Log in to the Algolia Crawler and navigate to the Account settings page.
- On the Account settings page, you can see the Crawler User ID (
ALGOLIA_CRAWLER_USER_ID
) and Crawler API Key (ALGOLIA_CRAWLER_API_KEY
).
-
Configure the following secrets in the Settings > Secrets and variables > Actions page of your GitHub repository:
ALGOLIA_CRAWLER_USER_ID
: the Crawler User ID of your Algolia Crawler account.ALGOLIA_CRAWLER_API_KEY
: the Crawler API Key of your Algolia Crawler account.ALGOLIA_APPLICATION_ID
: the Application ID of your Algolia account.ALGOLIA_API_KEY
: the API Key of your Algolia account.
-
Create a new workflow in your GitHub repository.
The following example shows how to trigger Algolia Crawler when you push to the
main
branch or manually trigger the workflow..github/workflows/crawl.ymlname: Crawl
on:
push:
branches: [ "main" ]
workflow_dispatch:
jobs:
crawl:
runs-on: ubuntu-latest
steps:
- name: Algolia Crawler Automatic Crawl
uses: algolia/algoliasearch-crawler-github-[email protected]
with:
crawler-user-id: ${{ secrets.ALGOLIA_CRAWLER_USER_ID }}
crawler-api-key: ${{ secrets.ALGOLIA_CRAWLER_API_KEY }}
github-token: ${{ github.token }}
crawler-name: YOUR_CRAWLER_NAME
algolia-app-id: ${{ secrets.ALGOLIA_APPLICATION_ID }}
algolia-api-key: ${{ secrets.ALGOLIA_API_KEY }}
// Change YOUR_SITE_URL to your own site URL
site-url: YOUR_SITE_URL -
Trigger the workflow manually or push a commit to the
main
branch.After the
Crawl
workflow is completed, you can check the crawl results in the Algolia Crawler.