Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(README): add installation steps and create new Jupyter notebook for Legat4me #680

Open
wants to merge 588 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
588 commits
Select commit Hold shift + click to select a range
f3fb594
feat(docker): update Dockerfile for improved installation process and…
unclecode Nov 16, 2024
6ca65b6
feat(docs): enhance deployment documentation with one-click setup, AP…
unclecode Nov 16, 2024
3561bac
feat(cache): introduce CacheMode and CacheContext for enhanced cachin…
unclecode Nov 17, 2024
34b8c44
fix(docs): remove unnecessary blank line in README for improved reada…
unclecode Nov 17, 2024
067afa7
feat(crawl): implement direct crawl functionality and introduce Cache…
unclecode Nov 17, 2024
46bacda
feat(database): implement version management and migration checks dur…
unclecode Nov 17, 2024
8e3d025
Update changelog for 0.3.74
unclecode Nov 17, 2024
3e249a1
feat(docs): update examples and documentation to replace bypass_cache…
unclecode Nov 17, 2024
31da621
feat(docs): update README for version 0.3.74 with new features and im…
unclecode Nov 17, 2024
0e0f37b
feat(docker): add Docker Compose configurations for local and hub dep…
unclecode Nov 18, 2024
de3c2a6
Merge remote-tracking branch 'origin/main' into 0.3.74
unclecode Nov 18, 2024
770167b
chore: update .gitignore to include manage-collab.sh
unclecode Nov 19, 2024
66f2fb6
test: trying to push to main
Nov 19, 2024
5e7a5c3
test1: trying to push to main
Nov 19, 2024
4903808
Update .gitignore to include .gitboss/ and todo_executor.md
unclecode Nov 19, 2024
9ef4a43
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode Nov 19, 2024
48fe898
Remove test files
unclecode Nov 19, 2024
1816934
test: trying to push to 0.3.74
Nov 19, 2024
7813705
Delete test3.txt
unclecode Nov 19, 2024
038b657
Update .gitignore to exclude additional scripts and files
unclecode Nov 19, 2024
e164891
chore: add manage-collab.sh to .gitignore
unclecode Nov 19, 2024
9b91404
Merge branch '0.3.74' of https://github.com/unclecode/crawl4ai into 0…
unclecode Nov 19, 2024
a17f19b
Fix #260 prevent pass duplicated kwargs to scrapping_strategy (#269)
darwing1210 Nov 20, 2024
5002f9e
fix: crawler strategy exception handling and fixes (#271)
NanmiCoder Nov 20, 2024
d06a160
In this commit, we introduce the new concept of MakrdownGenerationStr…
unclecode Nov 21, 2024
7086441
feat: enhance image processing capabilities
unclecode Nov 22, 2024
ae28a1c
Update Redme
unclecode Nov 22, 2024
8b4144f
feat: enhance Markdown generation to include fit_html attribute
unclecode Nov 22, 2024
5de2033
chore: update README to reflect new features and improvements in vers…
unclecode Nov 22, 2024
fbbd770
chore: update README to include new features and improvements for ver…
unclecode Nov 22, 2024
2657010
Merge branch '0.3.74'
unclecode Nov 22, 2024
29af3d2
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode Nov 22, 2024
2852e6e
feat: add enhanced markdown generation example with citations and fil…
unclecode Nov 22, 2024
3b1c5d0
refactor: Add group ID to for images extracted from srcset.
unclecode Nov 23, 2024
d1fc667
feat: update version to 0.3.741 and enhance content filtering with he…
unclecode Nov 23, 2024
24d8f45
chore: remove Railway deployment configuration and related documentation
unclecode Nov 24, 2024
1f2f025
feat: add support for arm64 platform in Docker commands and update IN…
unclecode Nov 24, 2024
81377e8
feat: update version to 0.3.742
unclecode Nov 24, 2024
aa63172
chore: remove deprecated Docker Compose configurations for crawl4ai s…
unclecode Nov 24, 2024
7e09ad9
chore: remove deprecated Docker Compose configurations for crawl4ai s…
unclecode Nov 24, 2024
691ec2b
docs: update CONTRIBUTORS.md to acknowledge aadityakanjolia4 for fixi…
unclecode Nov 27, 2024
6b37d50
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode Nov 27, 2024
fdd6198
docs: enhance development installation instructions (#286)
nelzomal Nov 27, 2024
1850cb1
Fix: handled the cases where markdown_with_citations, references_mark…
HamzaFarhan Nov 27, 2024
19ed241
Enhance features and documentation
unclecode Nov 28, 2024
355f14d
Merge branch 'next' of https://github.com/unclecode/crawl4ai into next
unclecode Nov 28, 2024
ce8c7c1
feat: update changelog for version 0.3.743 with new features, improve…
unclecode Nov 28, 2024
8f67e07
Merge branch 'main' into 0.3.743
unclecode Nov 28, 2024
75781cf
fix: resolve merge conflict in DefaultMarkdownGenerator affecting fit…
unclecode Nov 28, 2024
98176e1
docs: update README for version 0.3.743 with new features, enhancemen…
unclecode Nov 28, 2024
9798956
docs: update README for version 0.3.743 with new features, enhancemen…
unclecode Nov 28, 2024
e9c3349
docs: update README to keep details open for extraction capabilities,…
unclecode Nov 28, 2024
b10e342
docs: update README for version 0.3.743 with improved formatting and …
unclecode Nov 28, 2024
79773b1
refactor: update cache handling in quickstart_async example to use Ca…
unclecode Nov 28, 2024
483bc8c
fix: correct typo in function documentation for clarity and accuracy
unclecode Nov 28, 2024
7892f15
docs: update README to reflect new branding and improve section headi…
unclecode Nov 28, 2024
0fa9f3a
docs: refine README content for clarity and conciseness, improving de…
unclecode Nov 28, 2024
8298135
docs: enhance README with development TODOs and refine mission statem…
unclecode Nov 28, 2024
57d1cb8
Merge branch 'next' - Update README, and quickstart examples
unclecode Nov 28, 2024
ef25f4e
docs: update quickstart_async.py to enable example function calls for…
unclecode Nov 28, 2024
44ec570
feat: implement create_box_message utility for formatted error messag…
unclecode Nov 28, 2024
2c611c4
chore: update version to 0.3.744 and add publish.sh to .gitignore
unclecode Nov 28, 2024
a20d2ce
docs: fix link formatting for recent updates section in README
unclecode Nov 28, 2024
353e99a
docs: fix link formatting for recent updates section in README
unclecode Nov 28, 2024
ecf4819
docs: fix link formatting for recent updates section in README
unclecode Nov 28, 2024
3a3c85e
docs: fix link formatting for recent updates section in README
unclecode Nov 28, 2024
d2fbf2a
CRAWL4_AI_BASE_DIRECTORY should be Path object instead of string (#298)
paulokuong Nov 28, 2024
a06f4c5
Enhance setup process and update contributors list
unclecode Nov 28, 2024
0046a90
chore: update version to 0.3.745
unclecode Nov 28, 2024
9445951
fix: improve handling of CRAWL4_AI_BASE_DIRECTORY environment variabl…
unclecode Nov 28, 2024
252fd1b
Merge branch 'next'
unclecode Nov 28, 2024
d798151
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode Nov 28, 2024
be34bb9
docs: update README to reflect latest version v0.3.745
unclecode Nov 28, 2024
c991b2b
fix: update package versions in requirements.txt for compatibility
unclecode Nov 28, 2024
698550f
Update README.md (#300)
unclecode Nov 28, 2024
cdc163b
Migrating from the classic setup.py to a using PyProject approach.
unclecode Nov 29, 2024
3ff0a1e
refactor: remove legacy build hooks and setup files, migrate to setup…
unclecode Nov 29, 2024
320d8d5
Enhance installation and migration processes
unclecode Nov 29, 2024
38a8490
Refactor Dockerfile and clean up main.py
unclecode Nov 29, 2024
025b102
Enhance Docker support and improve installation process
unclecode Nov 29, 2024
c18bf84
docs: update Raspberry Pi section to indicate upcoming support
unclecode Nov 29, 2024
48bd41a
Merge branch 'next'
unclecode Nov 29, 2024
04d8089
:adhesive_bandage: Page-evaluate navigation destroyed error (#304)
dvschuyl Nov 29, 2024
6aa3de0
fix: handle errors during image dimension updates in AsyncPlaywrightC…
unclecode Nov 29, 2024
d8b7c2f
docs: add contributor entry for dvschuyl regarding AsyncPlaywrightCra…
unclecode Nov 29, 2024
c652475
Enhance User-Agent Handling
unclecode Nov 30, 2024
568c5ec
bump version to 0.3.747
unclecode Nov 30, 2024
d2ae877
Add PruningContentFilter with unit tests and update documentation
unclecode Dec 1, 2024
ccf2c22
fix: pass logger to WebScrapingStrategy and update score computation …
unclecode Dec 2, 2024
34b00c8
refactor: improve error handling in DataProcessor and optimize data p…
unclecode Dec 3, 2024
e7e1afd
docs: update README and blog for version 0.4.0 release, highlighting …
unclecode Dec 3, 2024
b9a3270
Updated to version 0.4.0 with new features
unclecode Dec 4, 2024
9a44c20
Merge branch 'next'
unclecode Dec 4, 2024
bc76574
Merge issues with 0.4.0 is over
unclecode Dec 4, 2024
c35c7b9
Refactored web scraping components
unclecode Dec 5, 2024
b2e36a2
feat: Enhance AsyncPlaywrightCrawlerStrategy with text-only and light…
unclecode Dec 8, 2024
7ad0d03
Merge branch 'next'
unclecode Dec 8, 2024
d066b41
fixing Readmen tap (#313)
olavohenrique03 Dec 9, 2024
bc8c7f2
fix: The extract method logs output only when self.verbose is set to …
lu4nx Dec 9, 2024
a36506c
Commit Message:
unclecode Dec 9, 2024
14aaafb
Fixed typo (#324)
moamamun Dec 9, 2024
56397ab
Implement new async crawler features and stability updates
unclecode Dec 10, 2024
44704ff
Add PDF & screenshot functionality, new tutorial
unclecode Dec 10, 2024
d818c8b
Update async_webcrawler.py (#337)
lvzhengri Dec 10, 2024
af7ac5c
Add full-page screenshot and PDF export features
unclecode Dec 10, 2024
d67ffb6
Enhance AsyncWebCrawler and related configurations
unclecode Dec 12, 2024
a88d192
Bump version to 0.4.2
unclecode Dec 12, 2024
67f646c
chore: Update .gitignore to include new files and directories
unclecode Dec 12, 2024
ad90208
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode Dec 12, 2024
71d964e
Add release notes and documentation for version 0.4.2: Configurable C…
unclecode Dec 12, 2024
d69861c
Merge branch 'next'
unclecode Dec 12, 2024
69e2493
Update README for version 0.4.2: Reflect new features and enhancements
unclecode Dec 12, 2024
75caa70
Feature: Add Markdown generation to CrawlerRunConfig
unclecode Dec 13, 2024
c3a0599
Fix js_snipprt issue 0.4.21
unclecode Dec 15, 2024
8798c73
Bump version to 0.4.22
unclecode Dec 15, 2024
ea83974
Enhance crawler features and improve documentation
unclecode Dec 16, 2024
7d6c427
Bump version to 0.4.23
unclecode Dec 16, 2024
8c6581e
Enhance crawler strategies with new features
unclecode Dec 17, 2024
cd12fe9
Enhance Crawl4AI with new features and documentation
unclecode Dec 19, 2024
7c1d8f7
Refactor deployment configuration and enhance browser debugging options
unclecode Dec 20, 2024
a9ac9c0
Commit Message:
unclecode Dec 21, 2024
ab7b391
Fix #340 example llm_extraction (#358)
Haopeng138 Dec 24, 2024
6920d68
Enhance crawler capabilities and documentation
unclecode Dec 25, 2024
30b9159
Commit Message:
unclecode Dec 26, 2024
ccd844d
Renames browser_config param to config in AsyncWebCrawler
unclecode Dec 26, 2024
f9387cb
Update simple-crawling.md (#379)
iamrobins Dec 27, 2024
a2592a0
Commit Message:
unclecode Dec 29, 2024
7afec4a
Update README.md (#389)
unclecode Dec 30, 2024
d8e995b
Update README.md (#390)
unclecode Dec 30, 2024
e1ec879
Update the Tutorial section for new document version
unclecode Dec 31, 2024
15e93c8
Add ".do" to gitignore
unclecode Dec 31, 2024
a70cfd3
Recreate .do folder for removal
unclecode Dec 31, 2024
fd053ed
Recreate .do folder with temporary file
unclecode Dec 31, 2024
eb795c5
Remove .do folder from remote
unclecode Dec 31, 2024
2bc06c7
Delete .do/deploy.template.yaml (#394)
unclecode Dec 31, 2024
ae5a47d
update gitignore
unclecode Dec 31, 2024
28374ab
Update gitignore
unclecode Dec 31, 2024
eea2275
Remove .do folder from remote repository
unclecode Dec 31, 2024
9326371
Remove .do folder
unclecode Dec 31, 2024
5c344ca
Remove .do folder
unclecode Dec 31, 2024
2896b3e
Remove .local folder from remote repository
unclecode Dec 31, 2024
d1638cc
chore: prepare for version 0.4.24
unclecode Dec 31, 2024
390f801
chore: resolve merge conflicts for v0.4.24
unclecode Dec 31, 2024
e503604
docs: update README badges and Docker section, reorganize documentati…
unclecode Dec 31, 2024
25bc495
chore: bump version to 0.4.25
unclecode Dec 31, 2024
4725d4b
Fixe typo in CHANGELOG
unclecode Dec 31, 2024
f5884ff
Add 0.4.24 walkthrough
unclecode Dec 31, 2024
a735283
Update 0.4.24 walkthrough
unclecode Dec 31, 2024
8ba69cc
Fix issue in 0.4.24 walkthrough
unclecode Dec 31, 2024
e837856
Fix issue in 0.4.24 walkthrough
unclecode Dec 31, 2024
3adca35
Fix bug reported in issue https://github.com/unclecode/crawl4ai/issue…
unclecode Jan 1, 2025
b57f6fa
Bumb version v0.4.241
unclecode Jan 1, 2025
823154e
Uphrade plawyright installation command to install dependencies
unclecode Jan 1, 2025
49c6c69
refactor(install): simplify Playwright installation error handling
unclecode Jan 1, 2025
5a7ed4b
docs: update project description emojis
unclecode Jan 1, 2025
230b485
build: modernize package configuration with pyproject.toml
unclecode Jan 1, 2025
1e1cfc4
refactor(build): simplify setup.py configuration
unclecode Jan 1, 2025
0f16f1f
feat(install): specify chrome and chromium for playwright
unclecode Jan 1, 2025
35ec1f4
feat(install): add doctor command and force browser install
unclecode Jan 1, 2025
4676ebb
docs: simplify installation instructions
unclecode Jan 1, 2025
968a1c0
refactor(install): use chromium as default browser
unclecode Jan 1, 2025
558a899
docs: update REAME browser installation command
unclecode Jan 1, 2025
93bc1e8
fix: ensure js_snippet files are included in package
unclecode Jan 1, 2025
7d7abe1
build: streamline package discovery and bump to v0.4.243
unclecode Jan 1, 2025
5178c65
docs: update README
unclecode Jan 1, 2025
1fe0987
Merge branch 'v0.4.243'
unclecode Jan 1, 2025
8740a72
fix(browser)!: default to Chromium channel for new headless mode (#387)
Umpire2018 Jan 1, 2025
4c5e91d
fix(browser): update default browser channel to chromium and simplify…
unclecode Jan 1, 2025
20ed76b
Update Version
unclecode Jan 1, 2025
50b8d5f
refactor(browser):
unclecode Jan 1, 2025
1d57135
- Bump version to 0.4.244
unclecode Jan 1, 2025
3f62e8d
Update README
unclecode Jan 1, 2025
55b938b
fix(browser): resolve merge conflicts in browser channel configuration
unclecode Jan 1, 2025
9d307ee
refactor(crawler): optimize response handling and default settings
unclecode Jan 1, 2025
c460a93
refactor(crawler):
unclecode Jan 1, 2025
a329261
Update version file
unclecode Jan 1, 2025
6ebde6b
Merge branch 'vr0.4.246'
unclecode Jan 1, 2025
8d825cd
refactor:
unclecode Jan 1, 2025
eb5af58
refactor():
unclecode Jan 2, 2025
67a10d6
fix: prevent memory leaks by ensuring proper closure of Playwright pages
unclecode Jan 3, 2025
d4207a1
fix: not working long page screenshot (#403)
TheCutestCat Jan 5, 2025
9576969
Merge branch 'main' of https://github.com/unclecode/crawl4ai into next
unclecode Jan 5, 2025
bf0b083
fix(extraction): JsonCss selector and crawler improvements
unclecode Jan 5, 2025
09edc9c
docs(extraction): add clarifying comments for CSS selector behavior
unclecode Jan 5, 2025
d7986d2
Docs: Add Code of Conduct for the project (#410)
aravindkarnam Jan 6, 2025
c412977
Update CHANGELOG
unclecode Jan 6, 2025
e15501f
Update gitignore
unclecode Jan 6, 2025
70ac608
Update gitignore
unclecode Jan 6, 2025
65794c7
Merge branch 'vr0.4.267'
unclecode Jan 6, 2025
1f04d75
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode Jan 6, 2025
e57c790
refactor(docs): reorganize documentation structure and update styles
unclecode Jan 7, 2025
7b5ab42
refactor(doc)
unclecode Jan 7, 2025
669e607
Update .gitattributes
unclecode Jan 7, 2025
34201c1
Update .gitattributes
unclecode Jan 7, 2025
d0cd5cf
Update .gitattributes
unclecode Jan 7, 2025
fca06bc
chore: add .gitattributes file
unclecode Jan 8, 2025
fd8973f
Remove .codeiumignore from version control and add to .gitignore
unclecode Jan 8, 2025
83fed02
Remove .codeiumignore from version control and add to .gitignore
unclecode Jan 8, 2025
e13adbb
Update All docs 2025 8th Jan
unclecode Jan 8, 2025
d3090ee
Update all documents
unclecode Jan 8, 2025
33113db
docs(readme): update personal story and project vision
unclecode Jan 8, 2025
74f6616
docs(urls): update documentation URLs to new domain
unclecode Jan 9, 2025
8c12390
docs(urls): update documentation URLs to new domain
unclecode Jan 9, 2025
2cd9d89
feat(crawler): add memory-adaptive dispatcher with rate limiting
unclecode Jan 10, 2025
d521c64
Merge branch 'next' into next-cdp
unclecode Jan 10, 2025
2d89861
refactor(dispatcher): migrate to modular dispatcher system with enhan…
unclecode Jan 11, 2025
82fb5a3
feat(scraping): add LXML-based scraping mode for improved performance
unclecode Jan 12, 2025
6d55ad6
refactor(scraping): replace ScrapingMode enum with strategy pattern
unclecode Jan 13, 2025
953bff9
Apply Ruff Corrections
unclecode Jan 13, 2025
7e23a4d
Fixing minor typos in README (#440)
mcam10 Jan 13, 2025
e84f16a
Updated the correct link for "Contribution guidelines" in README.md (…
Jan 13, 2025
29a18ef
chore(cleanup): remove unused files and improve type hints
unclecode Jan 14, 2025
9100e2c
fix(models): make model fields optional with default values
unclecode Jan 15, 2025
57429b5
fix(dispatcher): adjust memory threshold and fix dispatcher initializ…
unclecode Jan 16, 2025
3211c64
refactor(browser): improve browser path management
unclecode Jan 17, 2025
6e91404
feat(content-filter): add LLMContentFilter for intelligent markdown g…
unclecode Jan 18, 2025
80ae767
feat(dispatcher): add streaming support for URL processing
unclecode Jan 19, 2025
fb272a7
Streamline Feature requests, bug reports and Forums with Forms & Temp…
aravindkarnam Jan 19, 2025
1055e87
feat(browser): improve browser context management and add shared data…
unclecode Jan 19, 2025
09e68ea
feat(config): add streaming support and config cloning
unclecode Jan 19, 2025
81d130d
docs(api): add streaming mode documentation and examples
unclecode Jan 19, 2025
bae596e
feat(crawler): add URL redirection tracking
unclecode Jan 19, 2025
9d19bb2
feat(extraction): add LLM-powered schema generation utility
unclecode Jan 20, 2025
4176944
feat(proxy): add proxy configuration support to CrawlerRunConfig
unclecode Jan 20, 2025
a7289ec
feat(robots): add robots.txt compliance support
unclecode Jan 21, 2025
3c51b6e
feat(release): prepare v0.4.3 beta release
unclecode Jan 21, 2025
d6748f5
docs(readme): update version and feature announcements for v0.4.3b1
unclecode Jan 21, 2025
8d33021
feat(proxy): add proxy rotation support and documentation
unclecode Jan 22, 2025
dae8bad
refactor(models): rename final_url to redirected_url for consistency
unclecode Jan 22, 2025
dbdf940
docs(examples): update demo scripts and fix output formats
unclecode Jan 22, 2025
abe3bd6
docs(examples): update v0.4.3 features demo to v0.4.3b2
unclecode Jan 22, 2025
bde2611
docs(readme): update version references and fix links
unclecode Jan 22, 2025
6e38379
Merge branch 'vr0.4.3b2'
unclecode Jan 22, 2025
a7b2982
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode Jan 22, 2025
480a360
docs(readme): resolve merge conflict and update version info
unclecode Jan 22, 2025
c08f5db
refactor(core): improve type hints and remove unused file
unclecode Jan 23, 2025
4a18a7e
docs(multi-url): improve documentation clarity and update examples
unclecode Jan 23, 2025
8d24db7
style(docs): improve code formatting in features demo
unclecode Jan 23, 2025
d379a72
refactor(examples): update API usage in features demo
unclecode Jan 23, 2025
9198d75
feat(browser): add CDP URL configuration support
unclecode Jan 24, 2025
a6fa4a2
refactor(user-agent): improve user agent generation system
unclecode Jan 25, 2025
94b703e
docs(examples): update proxy rotation demo and disable other demos
unclecode Jan 25, 2025
0bc656b
feat(demo): uncomment feature demos and add fake-useragent dependency
unclecode Jan 25, 2025
b894601
Merge branch 'vr0.4.3b3'
unclecode Jan 25, 2025
02792e0
Update README.md (#562)
unclecode Jan 26, 2025
49d00b6
docs(README): add installation steps and create new Jupyter notebook …
lassedrud Feb 14, 2025
2997699
feat(setup): add Qdrant setup instructions and new Jupyter notebook; …
lassedrud Feb 21, 2025
c67fd95
chore(.gitignore): add secrets directory to ignore list
lassedrud Feb 21, 2025
0f710ff
feat(dependencies): add new libraries for data processing and analysi…
lassedrud Feb 22, 2025
6005173
chore: update project structure and improve documentation
lassedrud Feb 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
GROQ_API_KEY = "YOUR_GROQ_API"
OPENAI_API_KEY = "YOUR_OPENAI_API"
ANTHROPIC_API_KEY = "YOUR_ANTHROPIC_API"
# You can add more API keys here
12 changes: 12 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Documentation
*.html linguist-documentation
docs/* linguist-documentation
docs/examples/* linguist-documentation
docs/md_v2/* linguist-documentation

# Explicitly mark Python as the main language
*.py linguist-detectable=true
*.py linguist-language=Python

# Exclude HTML from language statistics
*.html linguist-detectable=false
59 changes: 59 additions & 0 deletions .github/DISCUSSION_TEMPLATE/feature-requests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
title: "[Feature Request]: "
labels: ["⚙️ New"]
body:
- type: markdown
attributes:
value: |
Thank you for your interest in suggesting a new feature! Before you submit, please take a moment to check if already exists in
this discussions category to avoid duplicates. 😊

- type: textarea
id: needs_to_be_done
attributes:
label: What needs to be done?
description: Please describe the feature or functionality you'd like to see.
placeholder: "e.g., Return alt text along with images scraped from a webpages in Result"
validations:
required: true

- type: textarea
id: problem_to_solve
attributes:
label: What problem does this solve?
description: Explain the pain point or issue this feature will help address.
placeholder: "e.g., Bypass Captchas added by cloudflare"
validations:
required: true

- type: textarea
id: target_users
attributes:
label: Target users/beneficiaries
description: Who would benefit from this feature? (e.g., specific teams, developers, users, etc.)
placeholder: "e.g., Marketing teams, developers"
validations:
required: false

- type: textarea
id: current_workarounds
attributes:
label: Current alternatives/workarounds
description: Are there any existing solutions or workarounds? How does this feature improve upon them?
placeholder: "e.g., Users manually select the css classes mapped to data fields to extract them"
validations:
required: false

- type: markdown
attributes:
value: |
### 💡 Implementation Ideas

- type: textarea
id: proposed_approach
attributes:
label: Proposed approach
description: Share any ideas you have for how this feature could be implemented. Point out any challenges your foresee
and the success metrics for this feature
placeholder: "e.g., Implement a breadth first traversal algorithm for scraper"
validations:
required: false
127 changes: 127 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
name: Bug Report
description: Report a bug with the Crawl4AI.
title: "[Bug]: "
labels: ["🐞 Bug","🩺 Needs Triage"]
body:
- type: input
id: crawl4ai_version
attributes:
label: crawl4ai version
description: Specify the version of crawl4ai you are using.
placeholder: "e.g., 2.0.0"
validations:
required: true

- type: textarea
id: expected_behavior
attributes:
label: Expected Behavior
description: Describe what you expected to happen.
placeholder: "Provide a detailed explanation of the expected outcome."
validations:
required: true

- type: textarea
id: current_behavior
attributes:
label: Current Behavior
description: Describe what is happening instead of the expected behavior.
placeholder: "Describe the actual result or issue you encountered."
validations:
required: true

- type: dropdown
id: reproducible
attributes:
label: Is this reproducible?
description: Indicate whether this bug can be reproduced consistently.
options:
- "Yes"
- "No"
validations:
required: true

- type: textarea
id: inputs
attributes:
label: Inputs Causing the Bug
description: Provide details about the inputs causing the issue.
placeholder: |
- URL(s):
- Settings used:
- Input data (if applicable):
render: bash

- type: textarea
id: steps_to_reproduce
attributes:
label: Steps to Reproduce
description: Provide step-by-step instructions to reproduce the issue.
placeholder: |
1. Go to...
2. Click on...
3. Observe the issue...
render: bash

- type: textarea
id: code_snippets
attributes:
label: Code snippets
description: Provide code snippets(if any). Add comments as necessary
placeholder: print("Hello world")
render: python

# Header Section with Title
- type: markdown
attributes:
value: |
## Supporting Information
Please provide the following details to help us understand and resolve your issue. This will assist us in reproducing and diagnosing the problem

- type: input
id: os
attributes:
label: OS
description: Please provide the operating system & distro where the issue occurs.
placeholder: "e.g., Windows, macOS, Linux"
validations:
required: true

- type: input
id: python_version
attributes:
label: Python version
description: Specify the Python version being used.
placeholder: "e.g., 3.8.5"
validations:
required: true

# Browser Field
- type: input
id: browser
attributes:
label: Browser
description: Provide the name of the browser you are using.
placeholder: "e.g., Chrome, Firefox, Safari"
validations:
required: false

# Browser Version Field
- type: input
id: browser_version
attributes:
label: Browser version
description: Provide the version of the browser you are using.
placeholder: "e.g., 91.0.4472.124"
validations:
required: false

# Error Logs Field (Text Area)
- type: textarea
id: error_logs
attributes:
label: Error logs & Screenshots (if applicable)
description: If you encountered any errors, please provide the error logs. Attach any relevant screenshots to help us understand the issue.
placeholder: "Paste error logs here and attach your screenshots"
validations:
required: false
8 changes: 8 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: Feature Requests
url: https://github.com/unclecode/crawl4ai/discussions/categories/feature-requests
about: "Suggest new features or enhancements for Crawl4AI"
- name: Forums - Q&A
url: https://github.com/unclecode/crawl4ai/discussions/categories/forums-q-a
about: "Ask questions or engage in general discussions about Crawl4AI"
19 changes: 19 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## Summary
Please include a summary of the change and/or which issues are fixed.

eg: `Fixes #123` (Tag GitHub issue numbers in this format, so it automatically links the issues with your PR)

## List of files changed and why
eg: quickstart.py - To update the example as per new changes

## How Has This Been Tested?
Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] I have added/updated unit tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
74 changes: 73 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
#secrets
secrets/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -164,4 +167,73 @@ cython_debug/
Crawl4AI.egg-info/
Crawl4AI.egg-info/*
crawler_data.db
.vscode/
.vscode/
.tests/
.test_pads/
test_pad.py
test_pad*.py
.data/
Crawl4AI.egg-info/

requirements0.txt
a.txt

*.sh
.idea
docs/examples/.chainlit/
docs/examples/.chainlit/*
.chainlit/config.toml
.chainlit/translations/en-US.json

local/
.files/

a.txt
.lambda_function.py
ec2*

update_changelog.sh

.DS_Store
docs/.DS_Store
tmp/
test_env/
**/.DS_Store
**/.DS_Store

todo.md
todo_executor.md
git_changes.py
git_changes.md
pypi_build.sh
git_issues.py
git_issues.md

.next/
.tests/
# .issues/
.docs/
.issues/
.gitboss/
todo_executor.md
protect-all-except-feature.sh
manage-collab.sh
publish.sh
combine.sh
combined_output.txt
.local
.scripts
tree.md
tree.md
.scripts
.local
.do
/plans
plans/

# Codeium
.codeiumignore
todo/

# windsurf rules
.windsurfrules
Loading