Invenio-Index-Migrator

https://img.shields.io/github/license/inveniosoftware/invenio-index-migrator.svg https://img.shields.io/travis/inveniosoftware/invenio-index-migrator.svg https://img.shields.io/coveralls/inveniosoftware/invenio-index-migrator.svg https://img.shields.io/pypi/v/invenio-index-migrator.svg

Elasticsearch index migrator for Invenio.

Further documentation is available at https://invenio-index-migrator.readthedocs.io/.

User’s Guide

This part of the documentation will show you how to get started in using Invenio-Index-Migrator.

Installation

Invenio-Index-Migrator is on PyPI. When you install invenio-index-migrator you must specify the appropriate extras dependency for the version of Elasticsearch you use:

$ # For Elasticsearch 2.x:
$ pip install invenio-index-migrator[elasticsearch2]

$ # For Elasticsearch 5.x:
$ pip install invenio-index-migrator[elasticsearch5]

$ # For Elasticsearch 6.x:
$ pip install invenio-index-migrator[elasticsearch6]

Configuration

Invenio module for information retrieval.

invenio_index_migrator.config.INDEX_MIGRATOR_RECIPES = {}

Index sync job definitions.

Example:

INDEX_MIGRATOR_RECIPES = dict(
    records=dict(
        cls='invenio_index_migrator.api.Migration',
        params=dict(
            strategy='cross_cluster_strategy',
            src_es_client=dict(
                prefix='',
                version=2,
                params=dict(
                    host='es2',
                    port=9200,
                    use_ssl=True,
                    http_auth='user:pass',
                    url_prefix='on-demand',
                ),
            ),
            jobs=dict(
                records_simple_reindex=dict(
                    cls='invenio_index_migrator.api.ReindexJob',
                    pid_type='recid',
                    index='records-record-v1.0.0',
                    rollover_threshold=10,
                    reindex_params=dict(
                        script=dict(
                            source="if (ctx._source.foo == 'bar') {...}",
                            lang='painless'
                        ),
                        source=dict(
                            sort=dict(
                                date='desc'
                            )
                        ),
                        dest=dict(
                            op_type='create'
                        ),
                    ),
                )
            )
        )
    )
)

Usage

Index syncing module.

Example application

Run the ElasticSearch and Redis server.

Run example development setup:

$ pip install -e .[all]
$ cd examples
$ ./app-setup.sh
$ ./app-fixtures.sh

Run example development server:

$ FLASK_DEBUG=1 FLASK_APP=app.py flask run -p 5000

Try to perform some search queries:

$ curl http://localhost:5000/?q=body:test

To be able to uninstall the example app:

$ ./app-teardown.sh

API Reference

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

API Docs

Index syncing module.

class invenio_index_migrator.api.Job(name, migration, config)[source]

Index migration job.

Initialize a migration job.

Parameters:
  • name – job’s name.
  • migration – an invenio_index_migrator.api.migration.Migration object.
  • config – job’s configuration.
cancel()[source]

Cancel the job.

create_index(index)[source]

Create indexes needed for the job.

document_name

Get the document name for the job.

rollover_actions()[source]

Rollover actions.

run()[source]

Run the job.

src_es_client

Get the source ES client.

status()[source]

Return the status of the job.

class invenio_index_migrator.api.Migration(name, **config)[source]

Index migration base class.

cancel()[source]

Cancel migration and all its jobs.

classmethod create_from_config(recipe_name, **recipe_config)[source]

Create Migration instance from config.

classmethod create_from_state(recipe_name, **recipe_config)[source]

Create Migration instance from ES state.

create_index()[source]

Create Elasticsearch index for the migration.

init(dry_run=False)[source]

Initialize the index with recipe and jobs documents.

load_jobs_from_config()[source]

Load jobs from config.

notify()[source]

Notify when rollover is possible.

Override this to notify the user whenever the threshold is reached and a rollover is possible.

rollover(force=False)[source]

Perform a rollover action.

run()[source]

Run the index sync job.

status()[source]

Get status for index sync job.

strategy[source]

Return migration strategy.

class invenio_index_migrator.api.MultiIndicesReindexJob(name, migration, config)[source]

Reindex job that uses Elasticsearch’s reindex API.

Initialize a migration job.

Parameters:
  • name – job’s name.
  • migration – an invenio_index_migrator.api.migration.Migration object.
  • config – job’s configuration.
create_index(index)[source]

Create templates.

initial_state(dry_run=False)[source]

Build job’s initial state.

rollover_actions()[source]

Rollover actions.

run()[source]

Fetch source index using ES Reindex API.

class invenio_index_migrator.api.ReindexJob(name, migration, config)[source]

Reindex job that uses Elasticsearch’s reindex API.

Initialize a migration job.

Parameters:
  • name – job’s name.
  • migration – an invenio_index_migrator.api.migration.Migration object.
  • config – job’s configuration.
cancel()[source]

Cancel reindexing job.

initial_state(dry_run=False)[source]

Build job’s initial state.

run()[source]

Fetch source index using ES Reindex API.

class invenio_index_migrator.api.ReindexAndSyncJob(name, migration, config)[source]

Job that both reindexes with ES reindex API and syncs with the DB.

The first run will use the reindex API and the subsequent runs will fetch from the database and sync the data.

Initialize a migration job.

Parameters:
  • name – job’s name.
  • migration – an invenio_index_migrator.api.migration.Migration object.
  • config – job’s configuration.
cancel()[source]

Cancel reinding and syncing job.

iter_indexer_ops(start_date=None, end_date=None)[source]

Iterate over documents that need to be reindexed.

run()[source]

Run reindexing and syncing job.

run_delta_job()[source]

Calculate delta from DB changes since the last update.

Utilities

Utility functions for index migration.

class invenio_index_migrator.utils.ESClient(es_config)[source]

ES clinet for sync jobs.

.

client[source]

Return ES client.

reindex_auth[source]

Return username and password for reindex HTTP authentication.

reindex_remote[source]

Return ES client reindex API host.

class invenio_index_migrator.utils.State(index, document_id, client=None)[source]

Migration ES state.

The state is stored in ElasticSearch and can be accessed similarly to a python dictionary.

Synchronization job state in ElasticSearch.

commit(state)[source]

Save the state to ElasticSearch.

create(initial_state, force=False)[source]

Create state document.

read()[source]

Fetch the current state from Elasticsearch.

invenio_index_migrator.utils.extract_doctype_from_mapping(mapping_fp)[source]

Extract the doc_type from mapping filepath.

invenio_index_migrator.utils.get_queue_size(queue)[source]

Get the queue size.

invenio_index_migrator.utils.obj_or_import_string(value, default=None)[source]

Import string or return object.

Params value:Import path or class object to instantiate.
Params default:Default object to return if the import fails.
Returns:The imported object.

Additional Notes

Notes on how to contribute, legal information and changes are here for the interested.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

Types of Contributions

Report Bugs

Report bugs at https://github.com/inveniosoftware/invenio-index-migrator/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.
Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implement it.

Write Documentation

Invenio-Index-Migrator could always use more documentation, whether as part of the official Invenio-Index-Migrator docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/inveniosoftware/invenio-index-migrator/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up invenio-index-migrator for local development.

  1. Fork the inveniosoftware/invenio-index-migrator repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/invenio-index-migrator.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv invenio-index-migrator
    $ cd invenio-index-migrator/
    $ pip install -e .[all]
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass tests:

    $ ./run-tests.sh
    

    The tests will provide you with test coverage and also check PEP8 (code style), PEP257 (documentation), flake8 as well as build the Sphinx documentation and run doctests.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -s
        -m "component: title without verbs"
        -m "* NEW Adds your new feature."
        -m "* FIX Fixes an existing issue."
        -m "* BETTER Improves and existing feature."
        -m "* Changes something that should not be visible in release notes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests and must not decrease test coverage.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring.
  3. The pull request should work for Python 2.7, 3.3, 3.4 and 3.5. Check https://travis-ci.org/inveniosoftware/invenio-index-migrator/pull_requests and make sure that the tests pass for all supported Python versions.

Changes

Version 1.0.0 (released 2019-05-21)

  • Initial public release.

License

MIT License

Copyright (C) 2015-2019 CERN.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Note

In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.

Contributors

  • Alexander Ioannidis
  • Niklas Persson
  • Zacharias Zacharodimos