Fetch Webpage
Extract content from any webpage with automatic HTML cleaning
Node Type
Action
Category
Web Integration
Icon
Web
Overview
The Fetch Webpage node allows you to extract content from any webpage by making HTTP requests to specified URLs. This powerful automation tool automatically cleans the HTML by removing scripts and styles to provide clean, readable content for further processing in your workflows.
Key Features
- • Universal Web Access: Fetch content from any publicly accessible webpage
- • Automatic HTML Cleaning: Remove scripts, styles, and unnecessary markup
- • Clean Content Output: Get readable text content for processing
- • Simple Configuration: Just provide a URL to get started
- • Error Handling: Built-in success/failure tracking
Prerequisites
Web Access
Must have access to the target webpages
Technical Requirements
Node Configuration
Required Fields
URL
The complete URL of the webpage to fetch. Must include the protocol (http:// or https://).
Technical Details
HTML Cleaning Process
How the node processes and cleans webpage content
Automatic Removal
The node automatically strips out the following HTML elements to provide clean content:
- •
<script>
tags and their content - •
<style>
tags and their content - • All JavaScript code and CSS styles
Content Processing
The remaining HTML content is preserved, maintaining the structure and readability of the webpage while removing unnecessary technical elements that could interfere with content analysis.
HTTP Request Handling
How the node makes requests to webpages
Request Method
Uses HTTP GET requests to fetch webpage content. The node respects standard HTTP status codes and will handle errors appropriately.
Response Processing
Converts the HTTP response to text format, then applies HTML cleaning before returning the processed content to your workflow.
Examples & Use Cases
Basic Webpage Fetching
Fetch content from a simple webpage
{
"url": "https://example.com"
}
Fetches the content from example.com and returns cleaned HTML without scripts or styles.
News Article Extraction
Extract article content from news websites
{
"url": "https://news-site.com/article/123"
}
Fetches a specific news article and returns clean content for further processing or analysis.
Dynamic URL with Workflow Data
Use workflow variables for dynamic webpage fetching
{
"url": "{{workflowData.targetUrl}}"
}
Uses a URL from workflow data to dynamically fetch different webpages based on your workflow logic.
Workflow Examples
Content Monitoring Pipeline
Monitor webpage content for changes
Workflow Structure
Step-by-Step Configuration
- Schedule Trigger: Run the workflow at regular intervals
- Fetch Webpage: Get current content from the monitored page
- AI Analysis: Analyze the content for important changes
- Content Comparison: Compare with previous versions
- Alert: Send notifications about significant changes
Data Collection from Multiple Sources
Collect data from various websites for analysis
Use Case
Automatically fetch content from multiple websites, extract relevant information, and compile it into a comprehensive report or database.
Implementation
- Use multiple Fetch Webpage nodes for different sources
- Process each webpage's content with AI analysis
- Extract structured data from the cleaned content
- Combine and format the collected information
Best Practices
Do's
- • Use HTTPS URLs when possible for security
- • Implement rate limiting between requests
- • Handle errors gracefully in your workflows
- • Respect robots.txt and website terms of service
- • Cache results when appropriate
- • Use specific URLs rather than homepage URLs
Don'ts
- • Don't make requests too frequently
- • Avoid scraping private or password-protected pages
- • Don't ignore HTTP error responses
- • Avoid making requests to the same page repeatedly
- • Don't assume all websites allow scraping
- • Avoid processing sensitive content without verification
Troubleshooting
Common Issues
Access Denied (403 Forbidden)
Symptoms: Node fails with access denied errors
Solution: The website may be blocking automated requests. Check if the page requires authentication or has anti-bot measures.
Page Not Found (404 Error)
Symptoms: Node fails with page not found errors
Solution: Verify the URL is correct and the webpage still exists. URLs can change or pages can be removed.
Timeout Errors
Symptoms: Node fails with timeout errors
Solution: The target website may be slow or experiencing issues. Try again later or check the website's status.