This project automates the transcription and summarization of audio, video, and text content using AWS services. By leveraging Amazon Transcribe for speech-to-text conversion, Amazon Textract for OCR of printed or handwritten text and Amazon Bedrock for generating summaries using foundational GenAI models, the tool streamlines the analysis of multimedia and textual information. Additionally, it supports text anonymization, and other handy features such as reading from HTTP(s) URL or PDF text extraction.
- Demonstration Purposes: This code is intended for demonstration purposes ONLY. It will incur AWS charges based on the usage of Amazon Bedrock, Amazon Textract, Amazon Transcribe, Amazon S3, and other AWS services this project may use in the future.
Please review AWS pricing details for each service used
The application employs the Chain of Responsibility design pattern to process inputs through a series of handlers. Each handler in the chain is responsible for a specific type of task, such as downloading YouTube videos, transcribing audio, summarizing text, or handling files from local file system or Amazon S3. This pattern provides flexibility in processing and allows for easy customization of the processing pipeline. It has been enhanced with dynamic handler discovery using Factory pattern to automatically identify and instantiate handlers as needed, significantly simplifying the extension and customization of the processing pipeline.
- Amazon Transcribe: Converts spoken words in audio or video files into text, producing accurate transcriptions.
- Amazon Textract: Automatically extract printed text, handwriting, layout elements, and data from any document.
- Amazon Comprehend: Natural-language processing (NLP) service used to extract valuable insights and PII data in text.
- Amazon Bedrock: Employs advanced AI models to summarize text, making it easier to digest large volumes of information.
- Amazon S3: Acts as a storage solution for the input files and the generated outputs, including transcripts and summaries.
- Others - Such as Quip for example.
The processing chain is composed of several handlers, that are dedicated to a particular processing step and organized into 3 groups: readers, processors, writers:
Readers:
-
LocalFileReaderHandler: Handles local audio, video, and text files for processing.
-
S3ReaderHandler: Manages the reading and downloading of S3 objects (files) from Amazon S3. = HTTPHandler: Generic HTTP handler that allows you to fetch HTML data from http(s) endpoints. It uses BeautifulSoup to clean HTML tags.
-
PDFReaderHandler: Extracts text from PDF documents for summarization.
-
MicrosoftExcelReaderHandler: Extract text from Microsoft Excel documents.
-
MicrosoftWordReaderHandler: Extract text from Microsoft Word documents.
-
QuipReaderHandler: Extract text from Quip document.
-
YouTubeReaderHandler: Downloads videos from YouTube URLs and extracts audio.
Processors:
- AmazonBedrockHandler: Summarizes text content using Amazon Bedrock.
- AmazonBedrockChatHandler: Used to perform interactive chat with Amazon Bedrock using the messages API.
- AmazonComprehendInsightsHandler: Extract valuable insights from your data using Amazon Comprehend NLP capabilities.
- AmazonComprehendPIIHandler, AmazonComprehendPIITokenizeHandler and AmazonComprehendPIIUntokenizeHandler: Used to detect, tokenize and untokenize PII data in your text retaining the context and allowing downstream services such as Bedrock to process the data without PII.
- AmazonTranscriptionHandler: Transcribes audio files into text using Amazon Transcribe.
- AmazonTextractHandler: Extracts text from images such as .jpg, .png, .tiff
- HTMLCleanerHandler: Used to clean HTML tags when consuming web page / HTML documents.
- PromptHandler: Uses a minimalistic prompt framework - all your prompts can be stored in the prompts/ folder and you can select which prompt to use when invoking the main.py.
Writers: -S3WriterHandler: Manages the uploading of of S3 objects (files) to Amazon S3. -LocalFileWriterHandler: Writes output into a local file. -ClipboardWriterHandler: Writes output into clipboard.
Handlers are linked together in a chain, where each handler passes its output to the next handler in the sequence until the processing is complete.
You can customize the processing chain in main.py by setting the sequence of handlers according to your specific needs. Here is an example of how to construct a custom processing chain:
from handlers.handler_factory import HandlerFactory
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
youtube_handler = HandlerFactory.get_handler("YouTubeReaderHandler")
amazon_s3_writer_handler = HandlerFactory.get_handler("AmazonS3WriterHandler")
amazon_transcribe_handler = HandlerFactory.get_handler("AmazonTranscriptionHandler")
amazon_bedrock_handler = HandlerFactory.get_handler("AmazonBedrockHandler")
anonymize_handler = HandlerFactory.get_handler("AmazonComprehendPIITokenizeHandler")
unanonymize_handler = HandlerFactory.get_handler("AmazonComprehendPIIUntokenizeHandler")
prompt_handler = HandlerFactory.get_handler("PromptHandler")
# Read Youtube Video >> Save Audio in Amazon S3 >> Extract text from speach (Amazon Transcribe) >> Detect & Tokenize PII >> Construct a prompt >> Summarize using Amazon Bedrock. >> Untokenize PII
youtube_handler.set_next(amazon_s3_writer_handler).set_next(amazon_transcribe_handler).set_next(prompt_handler).set_next(anonymize_handler).set_next(amazon_bedrock_handler).set_next(unanonymize_handler)
request = {"path": "https://www.youtube.com/watch?v=tQi97_DWi6A", "prompt_file_name": "default_prompt"}
result = youtube_handler.handle(request)
print(result.get("text"))
With the introduction of dynamic handler discovery and command-line arguments, you can now easily customize or specify custom processing chains without altering the codebase. The CLI supports flags for using predefined or custom chains based on runtime arguments.
The tool features a prompt template system in the prompts
folder, enabling users to tailor the summarization process. This system supports the use of custom prompts that can be specified at runtime, offering flexibility in how the summaries are generated.
- Python 3.8+
- AWS CLI installed and configured with AWS credentials
- Amazon Bedrock access to model. See Configuration.
- An active AWS account
- Clone the repository:
- Navigate to the project directory:
- Install the required dependencies:
pip install -r requirements.txt
-
Install ffmpeg
-
Create a
.env
file at the root of your project directory. -
Configure access to Amazon Bedrock models:
- Login in your Amazon Bedrock console, click Model Access > Manage model Access. Select the models you want to use (for example Claude 3 Sonnet) and click Save changes.
-
Add your AWS S3 configuration to the
.env
file:
# .env file
# AWS Specific configuration
BEDROCK_ASSUME_ROLE=None
AWS_DEFAULT_REGION="us-east-1"
# Amazon S3 bucket used for Amazon transcribe
BUCKET_NAME=your-s3-bucket-name
S3_FOLDER=uploads/
OUTPUT_FOLDER=transcriptions/
# Local download folder
DIR_STORAGE="./downloads"
# Amazon Bedrock Settings
AMAZON_BEDROCK_MODEL_ID="anthropic.claude-3-5-sonnet-20240620-v1:0"
# AMAZON_BEDROCK_MODEL_ID="anthropic.claude-3-haiku-20240307-v1:0"
# AMAZON_BEDROCK_MODEL_ID="anthropic.claude-3-sonnet-20240229-v1:0"
AMAZON_BEDROCK_MODEL_PROPS='{"max_tokens":4096, "anthropic_version": "bedrock-2023-05-31", "messages": [{"role": "user", "content": ""}]}'
AMAZON_BEDROCK_PROMPT_TEMPLATE="{prompt_text}"
AMAZON_BEDROCK_PROMPT_INPUT_VAR="$.messages[0].content"
AMAZON_BEDROCK_OUTPUT_JSONPATH="$.content[0].text"
# AMAZON_BEDROCK_MODEL_ID="anthropic.claude-v2"
# AMAZON_BEDROCK_MODEL_PROPS='{"prompt": "", "max_tokens_to_sample":4096, "temperature":0.5, "top_k":250, "top_p":0.5, "stop_sequences":[] }'
# AMAZON_BEDROCK_PROMPT_TEMPLATE="\n\nHuman:{prompt_text}\n\nAssistant:"
# AMAZON_BEDROCK_PROMPT_INPUT_VAR="$.prompt"
# AMAZON_BEDROCK_OUTPUT_JSONPATH="$.completion"
# AMAZON_BEDROCK_MODEL_ID="amazon.titan-text-express-v1"
# AMAZON_BEDROCK_MODEL_PROPS='{"inputText": "", "textGenerationConfig":{ "maxTokenCount":4096, "stopSequences":[], "temperature":0, "topP":1 }}'
# AMAZON_BEDROCK_PROMPT_TEMPLATE="\n{prompt_text}"
# AMAZON_BEDROCK_PROMPT_INPUT_VAR="$.inputText"
# AMAZON_BEDROCK_OUTPUT_JSONPATH="$.results[0].outputText"
# Copy output to clipboard
CLIPBOARD_COPY=false
# Used for integration with Quip
QUIP_TOKEN="<your_personal_token>"
QUIP_ENDPOINT="https://platform.quip.com"
QUIP_DEFAULT_FOLDER_ID=<default_folder_for_writing>
Execute the main.py
script, specifying the file path and an optional prompt template name:
python src/main.py <path_to_file_or_url> [prompt_file_name] [--chat {sum_first,chat_first,chat_only}] [--anonymize {yes}] [--custom]
<path_to_file_or_youtube_url>
: The path to your audio or video file.[prompt_file_name]
: Optional. The name of a custom prompt template (without the.txt
extension).--chat
: Enable interactive chat option. You can choose if you want to have a chat before or after summarization task, or chat only.--anonymize
: way to turn off anonymization. Currently this is enabled by default.--custom
: Allows you to execute a custom chain, defined in src/main.py: construct_custom_chain()
Example:
- Processing an Audio File
python src/main.py /path/to/file.mp3
- Using a cutom prompt template, located in
prompts/my_custom_prompto.txt
python src/main.py /path/to/file.pdf my_custom_prompt
- Executing with a custom processing chain:
python src/main.py /path/to/file.mp3 --custom
- Have a conversation with the LLM about your document
python src/main.py /path/to/local/file.pdf --chat=chat_only
This project takes inspiration from the Amazon Bedrock Workshop, provided by AWS Samples. The workshop offers a comprehensive guide and tools for integrating Amazon Bedrock into applications, which have been instrumental in developing the summarization functionalities of this project. For more information and access to these resources, visit the Amazon Bedrock Workshop on GitHub.
- AWS CLI Installation: Ensure the AWS CLI is installed and configured with your AWS credentials.
- Demonstration Purposes: This code is intended for demonstration purposes ONLY. It will incur AWS charges based on the usage of Amazon Bedrock, Amazon Textract, Amazon Transcribe, Amazon S3, and other AWS services this project may use in the future. ** Please review AWS pricing details for each service used**
- Use at Your Own Risk: The author assumes no liability for any charges or consequences arising from the use of this tool.
Contributions are welcome! Please feel free to fork the repository, make changes, and submit pull requests.
This project is released under the MIT License.