S3 Object Lambda using Textract

Pattern for extracting key value pair from documents for intelligent document processing

S3 Object Lambda Access PointS3 Object LambdaAmazon Textract
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  (uksb-1tthgi812) (tag:s3-object-lambda-textract)
  ObjectLambda-Textract

  Sample SAM Template for ObjectLambda-Textract

Resources:
  S3Bucket:
    Type: 'AWS::S3::Bucket'
    Properties:
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      VersioningConfiguration:
        Status: Enabled

  # S3 Access Point (Network origin: Internet)
  S3AccessPoint:
    Type: 'AWS::S3::AccessPoint'
    Properties:
      Bucket: !Ref S3Bucket
      Name: 'mys3bucket-textract-ap'

  # S3 Object Lambda Access Point
  S3ObjectLambdaAccessPoint:
    Type: 'AWS::S3ObjectLambda::AccessPoint'
    Properties:
      Name: 'my-textract-function-olap'
      ObjectLambdaConfiguration:
          SupportingAccessPoint: !Sub 'arn:aws:s3:${AWS::Region}:${AWS::AccountId}:accesspoint/${S3AccessPoint}'
          TransformationConfigurations:
          - Actions:
              - GetObject
            ContentTransformation:
              AwsLambda:
                FunctionArn: !GetAtt ObjectLambdaFunction.Arn
                FunctionPayload: 'test-payload'

  # Lambda function
  ObjectLambdaFunction:
    Type: 'AWS::Serverless::Function'
    Properties:
      CodeUri: app/
      Handler: app.handler
      Runtime: python3.11
      MemorySize: 1024
      # The function needs permission to call back to the S3 Object Lambda Access Point with the WriteGetObjectResponse.
      Policies:
        - AmazonS3ObjectLambdaExecutionRolePolicy
        - AmazonTextractFullAccess


Outputs:
  S3BucketName:
    Value: !Ref S3Bucket
    Description: S3 Bucket for object storage.
  S3AccessPointArn:
    Value: !Ref S3AccessPoint
    Description: Name of the S3 access point.
  S3ObjectLambdaAccessPointArn:
    Value: !GetAtt S3ObjectLambdaAccessPoint.Arn
    Description: ARN of the S3 Object Lambda access point.
  LambdaFunctionArn:
    Value: !Ref ObjectLambdaFunction
    Description: ObjectLambdaFunction ARN

Download

git clone https://github.com/aws-samples/serverless-patterns
cd serverless-patterns/s3-object-lambda-textract

Pattern repository

View on GitHub

Last updated on 26 Dec 2024

Edit this page