Skip to main content

Optimized Page Fetching

The OpenConnector SynchronizationService now includes an optimized page fetching implementation that dramatically improves performance when synchronizing paginated data sources by eliminating recursive overhead.

Overview

The original pagination implementation used recursive function calls which created significant overhead and performance bottlenecks. Each page fetch required a new function call stack, leading to inefficient memory usage and slower processing.

The optimized fetching implementation addresses this by:

  • Iterative processing: Simple loop-based approach eliminates recursive overhead
  • Efficient pagination detection: Smart detection of pagination patterns without extra API calls
  • Reduced function call overhead: Single method handles all page fetching
  • Performance monitoring: Detailed timing metrics for optimization analysis

Performance Impact

Before Optimization (Recursive)

  • 21 objects across 3 pages: ~2,600ms fetch time
  • Recursive function calls: Each page required a new function call stack
  • Memory overhead: Inefficient memory usage with deep call stacks

After Optimization (Iterative)

  • Expected improvement: 30-50% reduction in fetch time
  • Simple loop processing: Single function handles all pages iteratively
  • Reduced overhead: Eliminates recursive function call overhead

Technical Implementation

Core Methods

fetchAllPages()

Main entry point that uses optimized iterative processing instead of recursive calls.

private function fetchAllPages(
Source $source,
string $endpoint,
array $config,
Synchronization $synchronization,
int $currentPage,
bool $isTest = false,
?bool $usesNextEndpoint = null,
?bool $usesPagination = true
): array

fetchAllPagesOptimized()

Implements the optimized iterative fetching strategy:

  1. Uses a simple for loop instead of recursive calls
  2. Fetches pages one by one with minimal overhead
  3. Efficiently determines next page information
  4. Combines results without function call overhead

getNextPageInfo()

Determines next page URL and configuration based on pagination pattern analysis.

fetchSinglePage()

Handles individual page fetching with proper error handling and response parsing.

Pagination Support

The system supports multiple pagination patterns:

Next Endpoint URLs

{
'next': 'https://api.example.com/data?page=2',
'results': [...]
}

Page Number Parameters

{
'page': 1,
'total_pages': 5,
'results': [...]
}

Error Handling and Fallbacks

Automatic Fallback

When parallel fetching fails, the system automatically falls back to sequential processing:

try {
return $this->fetchAllPagesParallel(...);
} catch (\Exception $e) {
error_log('Parallel page fetching failed, falling back to sequential: ' . $e->getMessage());
return $this->fetchAllPagesSequential(...);
}

Timeout Protection

  • 30-second timeout for parallel operations
  • 50-page safety limit to prevent infinite loops
  • Rate limit detection and handling

Configuration

Automatic Detection

The system automatically detects:

  • Pagination method (next URLs vs page numbers)
  • Total page count
  • API response patterns

Safety Limits

  • Maximum 50 pages per synchronization
  • 30-second timeout for parallel operations
  • Minimum 10 objects per page assumption for last page detection

Performance Monitoring

Timing Metrics

The system now includes detailed timing information:

{
'timing': {
'stages': {
'fetch_objects': {
'duration_ms': 800,
'description': 'Fetching objects from external source (with parallel page fetching)',
'objects_fetched': 21,
'rate_limited': false,
'fetch_method': 'parallel_optimized'
}
}
}
}

Performance Analysis

  • Objects per second: Calculated throughput metric
  • Efficiency ratio: Comparison of processing vs fetch time
  • Slowest stage identification: Automatic bottleneck detection

Best Practices

API Considerations

  • Rate limiting: Ensure your API can handle concurrent requests
  • Connection limits: Consider server connection pool limits
  • Caching: Implement appropriate caching strategies

Monitoring

  • Watch timing metrics: Monitor 'fetch_objects' duration improvements
  • Check error logs: Look for fallback activations
  • Analyze patterns: Identify optimal pagination sizes

Troubleshooting

  • High timeout rates: May indicate API rate limiting
  • Frequent fallbacks: Could suggest network issues
  • Inconsistent performance: Check for variable page sizes

Expected Results

Performance Improvements

For the example case (21 objects, 3 pages):

  • Before: 2,600ms fetch time
  • Expected after: 1,800-2,000ms fetch time
  • Improvement: 25-35% reduction

Scalability Benefits

  • Linear improvement: More pages = greater time savings
  • Network optimization: Better bandwidth utilization
  • Resource efficiency: Reduced total synchronization time

Compatibility

Supported APIs

  • JSON APIs with standard pagination
  • REST APIs with next/previous links
  • APIs with page number parameters

Fallback Support

  • XML APIs: Automatic fallback to sequential
  • Custom pagination: Graceful degradation
  • Rate-limited APIs: Intelligent retry handling

Migration Notes

Automatic Activation

  • No configuration required: Parallel fetching is enabled by default
  • Transparent operation: Existing synchronizations work unchanged
  • Backward compatibility: Full support for existing configurations

Monitoring Migration

  • New timing fields: Additional metrics in synchronization logs
  • Performance baselines: Establish new performance expectations
  • Error monitoring: Watch for new error patterns during transition