Getting StartedΒΆ

  1. Clone ACHE’s repository:

    git clone https://github.com/ViDA-NYU/ache.git
    
  2. Compile and install ACHE (binaries will be available at ache/build/install/ache/bin/):

    cd ache
    ./gradlew installDist
    
  3. For convenience, let’s create an alias for ACHE’s command line interface:

    alias ache=./build/install/ache/bin/ache
    
  4. Now let’s start ACHE using the pre-trained page classifier model and seed URL list available in the repository (hit Ctrl+C at any time to stop the crawler):

    ache startCrawl -c config/config_focused_crawl/ -m config/sample_model/ -o data -s config/sample.seeds