The JCrawler project needs help!

Both members of the JCrawler team (Patrick and me) are very busy with our day job to keep taking care of this project in our spare time.

So, starting from 1. Jan 2013, we will disable new posts on the forum and stop all development for the 1.x series.

If you're interested in keeping JCrawler alive, you have the following two options.

a) Join the project

JCrawler needs at the very least someone who can answer on the forum, and someone that is able to develop and fix bugs (they may be the same person). We may remain available for occasional guidance.

NOTE: we need BOTH. If we find someone for support, but no developer, it's not enough.

b) Sponsor the project

If we get paid for developing JCrawler, it becomes part of our day job and can be handled.

Please note: I didn't say "donation". I say "sponsorship", which means paying development hours at our hourly rates (contact us if you're interested).

We're really sorry! But hey, we (and I mean all the JCrawler community) made a great job so far, so thank you everyone!

GiBiLogic hosts the JCrawler site and cooperate in its development and support.

They just launched a new site dedicated to their cool Joomla extensions and it's worth a visit.

Welcome, Guest
Username Password: Remember me

Server load
(1 viewing) (1) Guest
  • Page:
  • 1
  • 2

TOPIC: Server load

Server load 1 year, 11 months ago #4

  • shadow
  • OFFLINE
  • Fresh Boarder
  • Posts: 2
  • Karma: 0
Hi, I have installed JCrawler on one of my sites, but every time I run it, it sends the server load through the roof, 8.11 the highest I have seen, this then results in an internal server error. I believe the problem is server configuration as I have other unexplained load spikes that I think are linked to search engine bots crawling my site.
Have you experienced this before?
I have a Cloud VPS with 2Gb ram with the following:
PHP Built on: Linux dedivps-56.dedicloud.co.uk 2.6.18-194.26.1.el5.028stab079.2 #1 SMP Fri Dec 17 19:25:15 MSK 2010 x86_64
Database Version: 5.1.56
Database Collation: utf8_general_ci
PHP Version: 5.3.6
Web Server: Apache/2.2.3 (CentOS)
Web Server to PHP interface: cgi-fcgi
Joomla! Version: Joomla! 1.5.23 Stable [ senu takaa ama baji ] 04-March-2011 18:00 GMT
User Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.100 Safari/534.30
The topic has been locked.

Re: Server load 1 year, 11 months ago #5

  • zanardi
  • OFFLINE
  • Administrator
  • Posts: 314
  • Karma: 6
Sorry for delay in answer, i did not enable mail notifications.

1. Crawling your site, especially if it is quite a complex sites with many links, WILL make extensive resources use. There is at the moment no "solution" for this.

2. The "Internal Server Error" is most likely due to a timeout. In FastCGI environment, the PHP ini directives about execution time got ignored so it will stop after about 40 seconds.

Fact is, the only way to avoid 2. is to increment parallel connections and using more resources, thus getting back to 1; but it's the only reasonable solution.

We have two possible workarounds:
- in JCrawler 1.x, adding a "chaining" structure which calls different pages instead of doing the crawling in a single shot
- in JCrawler 2.x, which is in an early development stage, a new crawling engine, possibly Ajax-based, so we may insert all kind of priority / delay configuration options.
--
Francesco Abeni - JCrawler Contributor
See my other extensions on GiBiLogic Extensions site
The topic has been locked.

Re: Server load 1 year, 11 months ago #6

  • shadow
  • OFFLINE
  • Fresh Boarder
  • Posts: 2
  • Karma: 0
Thanks for reply, I will wait for JCrawler 2.
Regards
The topic has been locked.

Re: Server load 1 year, 11 months ago #7

  • zanardi
  • OFFLINE
  • Administrator
  • Posts: 314
  • Karma: 6
shadow wrote:
Thanks for reply, I will wait for JCrawler 2.


Ok, please note that there is no defined roadmap or planned release date yet, it is still in an early development stage.
--
Francesco Abeni - JCrawler Contributor
See my other extensions on GiBiLogic Extensions site
The topic has been locked.

Re: Server load 1 year, 10 months ago #8

First i must congratulate you for your software!
I saw that these problemes appeared and to the others webmasters that used the jcrawl!
You must fix some bugs! And then you will be the top in your category!
1. timeout error if you have many articles
2. More than 50% of my articles appear error 404 !!!

The problem in big sites is that your software freeze or it can not crawl all the articles!
In my site for example i have 8 categories only the 4 first appear the others appear 404 error! Maybe if you fix to crawl more slow to have better responding solute the problem or if you add the option to crawl tha last date articles for example "crawl from xx/xx/20xx" now i see in my site more than 400 articles is not appeared in my sitemap! I have test it many times !
The topic has been locked.

Re: Server load 1 year, 10 months ago #9

  • zanardi
  • OFFLINE
  • Administrator
  • Posts: 314
  • Karma: 6
@georgeofcreta:
thank you for your reporting. Yes, I also think that JCrawler is good and with a little bugfixing it could become great. That is why i joined the project a few weeks ago

My highest priority is the timeout issue, which should be fixed with a chaining method.

I am sorry I cannot promise any date, though. Both Patrick (the original creator and main developer of JCrawler) and myself have only a little spare time to dedicate to this project so it will slowly ... crawl forward
--
Francesco Abeni - JCrawler Contributor
See my other extensions on GiBiLogic Extensions site
The topic has been locked.
The following user(s) said Thank You: p.winkler
  • Page:
  • 1
  • 2
Time to create page: 0.20 seconds