aws_glue_job – Manage an AWS Glue job

From Get docs
Ansible/docs/2.9/modules/aws glue job module


aws_glue_job – Manage an AWS Glue job

New in version 2.6.


Synopsis

Requirements

The below requirements are needed on the host that executes this module.

  • boto
  • boto3
  • python >= 2.6

Parameters

Parameter Choices/Defaults Comments

allocated_capacity

-

The number of AWS Glue data processing units (DPUs) to allocate to this Job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory.

aws_access_key

string

AWS access key. If not set then the value of the AWS_ACCESS_KEY_ID, AWS_ACCESS_KEY or EC2_ACCESS_KEY environment variable is used.


aliases: ec2_access_key, access_key

aws_secret_key

string

AWS secret key. If not set then the value of the AWS_SECRET_ACCESS_KEY, AWS_SECRET_KEY, or EC2_SECRET_KEY environment variable is used.


aliases: ec2_secret_key, secret_key

command_name

-

Default:

"glueetl"

The name of the job command. This must be 'glueetl'.

command_script_location

- / required

The S3 path to a script that executes a job.

connections

-

A list of Glue connections used for this job.

debug_botocore_endpoint_logs

boolean

added in 2.8

  • no

  • yes

Use a botocore.endpoint logger to parse the unique (rather than total) "resource:action" API calls made during a task, outputing the set to the resource_actions key in the task results. Use the aws_resource_action callback to output to total list made during a playbook. The ANSIBLE_DEBUG_BOTOCORE_LOGS environment variable may also be used.

default_arguments

-

A dict of default arguments for this job. You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

description

-

Description of the job being defined.

ec2_url

string

Url to use to connect to EC2 or your Eucalyptus cloud (by default the module will use EC2 endpoints). Ignored for modules where region is required. Must be specified for all other modules if region is not used. If not set then the value of the EC2_URL environment variable, if any, is used.

max_concurrent_runs

-

The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

max_retries

-

The maximum number of times to retry this job if it fails.

name

- / required

The name you assign to this job definition. It must be unique in your account.

profile

string

Uses a boto profile. Only works with boto >= 2.24.0.

region

string

The AWS region to use. If not specified then the value of the AWS_REGION or EC2_REGION environment variable, if any, is used. See http://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region


aliases: aws_region, ec2_region

role

- / required

The name or ARN of the IAM role associated with this job.

security_token

string

AWS STS security token. If not set then the value of the AWS_SECURITY_TOKEN or EC2_SECURITY_TOKEN environment variable is used.


aliases: access_token

state

- / required

  • present
  • absent

Create or delete the AWS Glue job.

timeout

-

The job timeout in minutes.

validate_certs

boolean

  • no
  • yes

When set to "no", SSL certificates will not be validated for boto versions >= 2.6.0.



Notes

Note

  • If parameters are not set within the module, the following environment variables can be used in decreasing order of precedence AWS_URL or EC2_URL, AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY or EC2_ACCESS_KEY, AWS_SECRET_ACCESS_KEY or AWS_SECRET_KEY or EC2_SECRET_KEY, AWS_SECURITY_TOKEN or EC2_SECURITY_TOKEN, AWS_REGION or EC2_REGION
  • Ansible uses the boto configuration file (typically ~/.boto) if no credentials are provided. See https://boto.readthedocs.io/en/latest/boto_config_tut.html
  • AWS_REGION or EC2_REGION can be typically be used to specify the AWS region, when required, but this can also be configured in the boto config file


Examples

# Note: These examples do not set authentication details, see the AWS Guide for details.

# Create an AWS Glue job
- aws_glue_job:
    command_script_location: s3bucket/script.py
    name: my-glue-job
    role: my-iam-role
    state: present

# Delete an AWS Glue job
- aws_glue_job:
    name: my-glue-job
    state: absent

Return Values

Common return values are documented here, the following are the fields unique to this module:

Key Returned Description

allocated_capacity

integer

when state is present

The number of AWS Glue data processing units (DPUs) allocated to runs of this job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory.


Sample:

10

command

complex

when state is present

The JobCommand that executes this job.


name

string

when state is present

The name of the job command.


Sample:

glueetl

script_location

string

when state is present

Specifies the S3 path to a script that executes a job.


Sample:

mybucket/myscript.py

connections

dictionary

when state is present

The connections used for this job.


Sample:

{ Connections: [ 'list', 'of', 'connections' ] }

created_on

string

when state is present

The time and date that this job definition was created.


Sample:

2018-04-21T05:19:58.326000+00:00

default_arguments

dictionary

when state is present

The default arguments for this job, specified as name-value pairs.


Sample:

{ 'mykey1': 'myvalue1' }

description

string

when state is present

Description of the job being defined.


Sample:

My first Glue job

execution_property

complex

always

An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.


max_concurrent_runs

integer

when state is present

The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.


Sample:

1

job_name

string

always

The name of the AWS Glue job.


Sample:

my-glue-job

last_modified_on

string

when state is present

The last point in time when this job definition was modified.


Sample:

2018-04-21T05:19:58.326000+00:00

max_retries

integer

when state is present

The maximum number of times to retry this job after a JobRun fails.


Sample:

5

name

string

when state is present

The name assigned to this job definition.


Sample:

my-glue-job

role

string

when state is present

The name or ARN of the IAM role associated with this job.


Sample:

my-iam-role

timeout

integer

when state is present

The job timeout in minutes.


Sample:

300




Status

Authors

  • Rob White (@wimnat)

Hint

If you notice any issues in this documentation, you can edit this document to improve it.


© 2012–2018 Michael DeHaan
© 2018–2019 Red Hat, Inc.
Licensed under the GNU General Public License version 3.
https://docs.ansible.com/ansible/2.9/modules/aws_glue_job_module.html