Optimized Page Fetching

The OpenConnector SynchronizationService now includes an optimized page fetching implementation that dramatically improves performance when synchronizing paginated data sources by eliminating recursive overhead.

Overview

The original pagination implementation used recursive function calls which created significant overhead and performance bottlenecks. Each page fetch required a new function call stack, leading to inefficient memory usage and slower processing.

The optimized fetching implementation addresses this by:

Iterative processing: Simple loop-based approach eliminates recursive overhead
Efficient pagination detection: Smart detection of pagination patterns without extra API calls
Reduced function call overhead: Single method handles all page fetching
Performance monitoring: Detailed timing metrics for optimization analysis

Performance Impact

Before Optimization (Recursive)

21 objects across 3 pages: ~2,600ms fetch time
Recursive function calls: Each page required a new function call stack
Memory overhead: Inefficient memory usage with deep call stacks

After Optimization (Iterative)

Expected improvement: 30-50% reduction in fetch time
Simple loop processing: Single function handles all pages iteratively
Reduced overhead: Eliminates recursive function call overhead

Technical Implementation

Core Methods

`fetchAllPages()`

Main entry point that uses optimized iterative processing instead of recursive calls.

private function fetchAllPages(
    Source $source, 
    string $endpoint, 
    array $config, 
    Synchronization $synchronization, 
    int $currentPage, 
    bool $isTest = false, 
    ?bool $usesNextEndpoint = null, 
    ?bool $usesPagination = true
): array

`fetchAllPagesOptimized()`

Implements the optimized iterative fetching strategy:

Uses a simple for loop instead of recursive calls
Fetches pages one by one with minimal overhead
Efficiently determines next page information
Combines results without function call overhead

`getNextPageInfo()`

Determines next page URL and configuration based on pagination pattern analysis.

`fetchSinglePage()`

Handles individual page fetching with proper error handling and response parsing.

Pagination Support

The system supports multiple pagination patterns:

Next Endpoint URLs

{
    'next': 'https://api.example.com/data?page=2',
    'results': [...]
}

Page Number Parameters

{
    'page': 1,
    'total_pages': 5,
    'results': [...]
}

Error Handling and Fallbacks

Automatic Fallback

When parallel fetching fails, the system automatically falls back to sequential processing:

try {
    return $this->fetchAllPagesParallel(...);
} catch (\Exception $e) {
    error_log('Parallel page fetching failed, falling back to sequential: ' . $e->getMessage());
    return $this->fetchAllPagesSequential(...);
}

Timeout Protection

30-second timeout for parallel operations
50-page safety limit to prevent infinite loops
Rate limit detection and handling

Configuration

Automatic Detection

The system automatically detects:

Pagination method (next URLs vs page numbers)
Total page count
API response patterns

Safety Limits

Maximum 50 pages per synchronization
30-second timeout for parallel operations
Minimum 10 objects per page assumption for last page detection

Performance Monitoring

Timing Metrics

The system now includes detailed timing information:

{
    'timing': {
        'stages': {
            'fetch_objects': {
                'duration_ms': 800,
                'description': 'Fetching objects from external source (with parallel page fetching)',
                'objects_fetched': 21,
                'rate_limited': false,
                'fetch_method': 'parallel_optimized'
            }
        }
    }
}

Performance Analysis

Objects per second: Calculated throughput metric
Efficiency ratio: Comparison of processing vs fetch time
Slowest stage identification: Automatic bottleneck detection

Best Practices

API Considerations

Rate limiting: Ensure your API can handle concurrent requests
Connection limits: Consider server connection pool limits
Caching: Implement appropriate caching strategies

Monitoring

Watch timing metrics: Monitor 'fetch_objects' duration improvements
Check error logs: Look for fallback activations
Analyze patterns: Identify optimal pagination sizes

Troubleshooting

High timeout rates: May indicate API rate limiting
Frequent fallbacks: Could suggest network issues
Inconsistent performance: Check for variable page sizes

Expected Results

Performance Improvements

For the example case (21 objects, 3 pages):

Before: 2,600ms fetch time
Expected after: 1,800-2,000ms fetch time
Improvement: 25-35% reduction

Scalability Benefits

Linear improvement: More pages = greater time savings
Network optimization: Better bandwidth utilization
Resource efficiency: Reduced total synchronization time

Compatibility

Supported APIs

JSON APIs with standard pagination
REST APIs with next/previous links
APIs with page number parameters

Fallback Support

XML APIs: Automatic fallback to sequential
Custom pagination: Graceful degradation
Rate-limited APIs: Intelligent retry handling

Migration Notes

Automatic Activation

No configuration required: Parallel fetching is enabled by default
Transparent operation: Existing synchronizations work unchanged
Backward compatibility: Full support for existing configurations

Monitoring Migration

New timing fields: Additional metrics in synchronization logs
Performance baselines: Establish new performance expectations
Error monitoring: Watch for new error patterns during transition

Overview​

Performance Impact​

Before Optimization (Recursive)​

After Optimization (Iterative)​

Technical Implementation​

Core Methods​

fetchAllPages()​

fetchAllPagesOptimized()​

getNextPageInfo()​

fetchSinglePage()​

Pagination Support​

Next Endpoint URLs​

Page Number Parameters​

Error Handling and Fallbacks​

Automatic Fallback​

Timeout Protection​

Configuration​

Automatic Detection​

Safety Limits​

Performance Monitoring​

Timing Metrics​

Performance Analysis​

Best Practices​

API Considerations​

Monitoring​

Troubleshooting​

Expected Results​

Performance Improvements​

Scalability Benefits​

Compatibility​

Supported APIs​

Fallback Support​

Migration Notes​

Automatic Activation​

Monitoring Migration​