Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: tests and associated bugs #891

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

stevenh
Copy link

@stevenh stevenh commented Mar 26, 2025

This is a placeholder which contains all the fixes and improvements needed to get all the existing tests running error free.

There are a small number of tests which have been marked as skipped due to flaky behaviour or significant functionality needed to make them work.

The intention is to split this out into many more consumable changes.

This should not be merged as is.

@stevenh
Copy link
Author

stevenh commented Mar 27, 2025

Here's what the tests look like now in VS Code:
image
image

@stevenh stevenh closed this Mar 27, 2025
@stevenh stevenh reopened this Mar 27, 2025
@stevenh stevenh force-pushed the fix/tests branch 22 times, most recently from 48c34ff to 7036b1b Compare March 27, 2025 18:13
@stevenh stevenh force-pushed the fix/tests branch 2 times, most recently from a6544c6 to 969234e Compare March 27, 2025 20:51
@aravindkarnam aravindkarnam requested a review from unclecode March 28, 2025 08:25
@aravindkarnam
Copy link
Collaborator

@stevenh Thanks for this PR. This is really going to be a life saver when we do new releases. I've requested @unclecode for a quick review. We'll keep you posted.

This is a placeholder which contains all the fixes and improvements
needed to get all the existing tests running error free.

There are a small number of tests which have been marked as skipped
due to flaky behaviour or significant functionality needed to make
them work.

The intention is to split this out into many more consumable changes.

This should not be merged as is.
@stevenh
Copy link
Author

stevenh commented Mar 28, 2025

Thanks @aravindkarnam I've tried to minimise the differences over the past few days and it's now in a pretty good place.

As per this issue #893 I was going to break it out into separate PRs, which will take quite a bit of time, so if you're happy to review this as a single item that would avoid a lot of overhead which also comes with the challenge that the tests wont pass until all the fixes are merged.

I can obviously update this PR with the details of all the fixes, which would be much quicker.

Let me know the approach you would like me to take?

In the meantime I'll continue to refine, the main things left is to ensuring that all tests are using assertions as some were just doing prints.

@stevenh
Copy link
Author

stevenh commented Mar 28, 2025

Oh as a follow on it would be great to have the tests trigger in CI, happy to look at adding that to. The obvious challenge with the resources needed so some tests might not be possible, but we could have them skipped automatically if GitHub actions is detected.

Fix a bunch more issues and parametrise more tests and add asserts.
@stevenh stevenh force-pushed the fix/tests branch 3 times, most recently from 60bbe67 to d890f18 Compare March 31, 2025 15:49
Fix how CrawlResultContainer is used, ensuring all paths return one
and deep crawl unwraps so there's only one, not one per result.

Move it into models and directly extend CrawlResult so type hinting
works as expected for the single result.

Pass crawler into crawl_url instead of using field to simplify the code
avoiding the need for None checks.

Add more examples to AsyncWebCrawler and fix formatting to display
correctly in VS Code.

Add data checks to google_search crawler.

Correct abstract method declarations so they match implementation
which relies on yield being called.

Fix use of infinite in integer context.

Fix use of js parameter instead of js_code.

Various fixes to tests, ensuring repeatability and validation using
assert instead of just prints.

Allow pytest flags to be specified on the command line when running
a test directly with python on the cli.
stevenh added 3 commits March 31, 2025 17:21
Fix the use of common fixed port in a test causing failures if that
port is already in use, instead leverage a ephemeral port.

Increase the timeout on slow LLM test.
Increase timeouts for LLM tests and relax validation on LLM
JSON extraction results.
Use httpx.codes for status_code checks in tests to avoid magic values.

Add missing type hints.
@stevenh stevenh force-pushed the fix/tests branch 2 times, most recently from c1a4d15 to 1793ac5 Compare March 31, 2025 19:25
Revert quote and other formatting changes on test_cli.py to minimise
differences.
Revert formatting changes to test_docker.py to minimise differences.
stevenh added 4 commits March 31, 2025 20:45
Add details to pyproject.toml to improve the developer experience
including:
* Configuring pytest test timeouts, ignoring external warnings and
  asyncio scope.
* Disabling ruff formatting
* Creating developer package targets: dev, docker and test
Determine the server port in a way which works for both IPv4 and IPv6
addresses.
Rename test parameter function eliminating test_ prefix to prevent
pytest warning.
Only print out the comparison table on session end in verbose mode.
@stevenh
Copy link
Author

stevenh commented Mar 31, 2025

These changes are now in a reviewable state, with all tests passing locally.

Let me know if you want the separate PR breakdown.

Here's an example of the output from the cli:

========================================= test session starts ==========================================
platform darwin -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/steve/code/github.com/unclecode/crawl4ai
configfile: pyproject.toml
plugins: cov-6.0.0, anyio-4.9.0, asyncio-0.25.3, timeout-2.3.1, pytest_httpserver-1.1.2
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=function
timeout: 20.0s
timeout method: signal
timeout func_only: True
collected 509 items

tests/20241401/test_advanced_deep_crawl.py .                                                      [  0%]
tests/20241401/test_async_crawl_with_http_crawler_strategy.py ...                                 [  0%]
tests/20241401/test_async_crawler_strategy.py ...................                                 [  4%]
tests/20241401/test_async_markdown_generator.py ................                                  [  7%]
tests/20241401/test_async_webcrawler.py ..........                                                [  9%]
tests/20241401/test_cache_context.py .                                                            [  9%]
tests/20241401/test_deep_crawl.py ..                                                              [ 10%]
tests/20241401/test_deep_crawl_filters.py ....................................................... [ 21%]
tests/20241401/test_deep_crawl_scorers.py ........................                                [ 25%]
tests/20241401/test_http_crawler_strategy.py ...........                                          [ 27%]
tests/20241401/test_llm_filter.py .                                                               [ 28%]
tests/20241401/test_robot.py ........                                                             [ 29%]
tests/20241401/test_robot_parser.py .                                                             [ 29%]
tests/20241401/test_schema_builder.py ....                                                        [ 30%]
tests/20241401/test_stream.py .                                                                   [ 30%]
tests/20241401/test_stream_dispatch.py .                                                          [ 31%]
tests/async/test_0_4_2_browser_manager.py ......                                                  [ 32%]
tests/async/test_0_4_2_config_params.py .........                                                 [ 33%]
tests/async/test_async_downloader.py .........                                                    [ 35%]
tests/async/test_basic_crawling.py .....                                                          [ 36%]
tests/async/test_caching.py ....                                                                  [ 37%]
tests/async/test_chunking_and_extraction_strategies.py ....                                       [ 38%]
tests/async/test_content_extraction.py ......                                                     [ 39%]
tests/async/test_content_filter_bm25.py ...............                                           [ 42%]
tests/async/test_content_filter_prune.py ............                                             [ 44%]
tests/async/test_content_scraper_strategy.py ..................                                   [ 48%]
tests/async/test_crawler_strategy.py .....                                                        [ 49%]
tests/async/test_database_operations.py .....                                                     [ 50%]
tests/async/test_dispatchers.py ..........                                                        [ 52%]
tests/async/test_edge_cases.py .......                                                            [ 53%]
tests/async/test_error_handling.py ..s.ss                                                         [ 54%]
tests/async/test_evaluation_scraping_methods_performance_configs.py ......................        [ 59%]
tests/async/test_markdown_genertor.py ......                                                      [ 60%]
tests/async/test_parameters_and_options.py s......                                                [ 61%]
tests/async/test_performance.py ..s                                                               [ 62%]
tests/async/test_screenshot.py .....                                                              [ 63%]
tests/cli/test_cli.py ............                                                                [ 65%]
tests/docker/test_config_object.py .                                                              [ 65%]
tests/docker/test_core.py ........                                                                [ 67%]
tests/docker/test_crawl_task.py ssssssss                                                          [ 68%]
tests/docker/test_docker.py ......                                                                [ 70%]
tests/docker/test_dockerclient.py ..                                                              [ 70%]
tests/docker/test_serialization.py ...                                                            [ 71%]
tests/docker/test_server.py ................................ssssssss....                          [ 79%]
tests/docker/test_server_token.py ......................................ssss...                   [ 88%]
tests/hub/test_simple.py .s                                                                       [ 88%]
tests/legacy/test_cli_docs.py .                                                                   [ 89%]
tests/loggers/test_logger.py .                                                                    [ 89%]
tests/test_crawl_result_container.py ..........................................                   [ 97%]
tests/test_llmtxt.py .                                                                            [ 97%]
tests/test_scraping_strategy.py .                                                                 [ 98%]
tests/test_web_crawler.py ..s.......                                                              [100%]

======================== 482 passed, 27 skipped, 1 warning in 818.78s (0:13:38) ========================

@stevenh stevenh marked this pull request as ready for review March 31, 2025 21:11
@stevenh
Copy link
Author

stevenh commented Apr 1, 2025

I've re-reviewed all the changes and identified 50 potential individual PR's, most are relatively simple bug fixes with few that stand out as bit larger in scope / impact:

fix: config serialisation

Fix config serialisation by creating a new Serialisable type and adding missing module imports for ScoringStats and Logger.

This allows the config to be serialised and deserialised correctly.

Add missing initialisation for ScoringStats.

Add missing stats parameter to URLScorer and all its subclasses to ensure that the stats are serialisable.

fix: download handling

Fix the handling of file downloads in AsyncPlaywrightCrawlerStrategy which wasn't waiting for the download to complete before returning, which resulted in race conditions and incomplete or missing downloads.

fix: markdown caching

Fix the caching of markdown field in DB / files which was only storing the single value, which caused failures when using cached results.

Export the markdown field in StringCompatibleMarkdown, so we don't need to use a private field to ensure that the value is serialised correctly.

fix: crawl result handling

Fix the handling of crawl results, which were using inconsistent types. This now uses CrawlResultContainer for all crawl results, unwrapping as needed when performing deep crawls.

This moves CrawlResultContainer into models ensuring it can be imported where needed, avoiding circular imports.

Refactor CrawlResultContainer to subclass CrawlResult to provide type hinting in the single result case and ensure consistent handling of both synchronous and asynchronous results.

fix: BM25Okapi idf calculation

Fix the idf calculation in BM25Okapi to use the correct formula and ensure that the idf is calculated correctly. This prevents missing results when using BM25Okapi caused by zero idf values.

Removed commented out code to improve readability.

fix: links, media and metadata caching

Fix the storage of links, media and metadata to ensure that the correct values are stored and returned. This prevents incorrect results when using the cached results.

Use Field for default values in Media, Links and ScrapingResult pydantic models to prevent invalid results.

fix: test suite

Fix the test suite to ensure that all tests are run and validation, using asserts, is correctly performed.

Parameterise test so that individual tests can be run from either cli or IDE.

Standardise the main wrapper to allow calling directly using python including passing pytest flags.

Use local server where applicable to ensure test validation and avoid external dependencies ,such as docker, which improves test speed and the ability to debug issues.

Add type hints to improve linting validation and IDE support.

Re-enable tests which were previously disabled due to failures, which have now been fixed.

Use constants from httpx.codes for status codes to avoid magic numbers and improve comprehension.

Limit long running tests to avoid excessive run times.

All tests are now runnable using pytest.


The question is, should I split or not?

Happy to do that if that will help get all the fixes in, but obviously raising 50 individual PR's is a decent undertaking so would be great to confirm first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants