TextractDocumentAnalysis

Description

This activity allows you to analyze a file more specifically than TextractDetectText using the Amazon AWS Textract Analyze Document service.

Using TextractDocumentAnalysis it is possible to read the forms of a document as key - value pairs and the data contained in its tables as datatable objects. You must select at least one of the two modes (using the respective arguments) otherwise an exception will be thrown. To perform an analysis without tables or forms you need to use the TextractDetectText activity.

The Amazon AWS Textract service is not free and also depends on the type of analysis you want to perform: forms only, tables only, or both. See the page dedicated to prices for more information on this.

The activity needs an input file that can be provided in two ways: through an IFileValue or by using references (bucket name and file name) to a file previously loaded in s3. Look at s3 activities documentation to know how to manage fils in s3.

If you don't know how to get your AWS credentials have a look at the Activity Library AWS documentation.

AWS Credentials

AccessKey InArgument<String> REQUIRED

The access key for Amazon AWS authentication.

SecretKey InArgument<String> REQUIRED

The secret key for Amazon AWS authentication.


Features Type

DetectForms InArgument<Boolean>

Default is set to False. Set it to True if you want to detect Forms.

DetectTables InArgument<Boolean>

Default is set to False. Set it to True if you want to detect Tables.


File

The file you want to analyze.

S3BucketName InArgument<String> REQUIRED

The name of the bucket where the file you want to analyze is uploaded.

S3FileName InArgument<String>

The name of the file uploaded to s3 you want to analyze.


Output

The list of the detected blocks

A list of Key-Value pair representing the detected forms. If DetectForm argument is set to False this argument will be left null in any case so set it to true to make available this argument.

PageCount OutArgument<Int32>

The analyzed pages count.

A list AWSTextractTable objects representing the detected tables. If DetectTables argument is set to False this argument will be left null in any case so set it to true to make available this argument.


Settings

RegionEndpoint EAWSRegionEndpoint

An AWS Region is a collection of AWS resources in a geographic area. Each AWS Region is isolated and independent of the other Regions.

Have a look to AWS official page for more informations.