Abruzzo SEO specialist, .Net programming and computer stuff
Last 26th of October I started a test to monitor how PDF documents are crawled and indexed by search engines.
The test aimed to let me understand the following things about PDF documents like:
After just one week, Google started to show some concrete results. The other search engines are still looking around; only Ask return a couple of result, but anything that I can talk about.
Scope of this document is to highlight the fluctuation that the different PDF documents made (13 in total) made day-by-day.
Those are the first – most interesting – results I collected during the past weeks for the main document returned by a SERP generated using the URK “seiunamicone”.
Followed by the hidden results, those one that can be seen expanding the hidden menu (click on plus symbol).
I’ve been monitoring the SERP for a while, and apart from the first days where it has been continuous fluctuations, now the results seem to be stabilized.
Proposing a lot of images it would probably have been muddle-headed, however I can assure you that a lot of changes took place, and I presume that looking at the SERP tomorrow, something new could be highlighted.
Just to give you an example outside the above picture, on 3rd of November the third result was a document called PDF-test-without-headers-KD43.pdf – my test n. 11. To be honest have shown it there get me confused e I wasn’t able to figure out how it was possible.
That’s the reason for which I included a graph collecting the different SERPs changes.
This is the full graph.
Whilst this is a graph with the documents that just take part on the SERP during the period in which I monitored it.
Let’s analyze it altogether, but first let me remind you something about the documents generated. I assumed a KWD of the URK “seiunamicone” split between the page (42%) and the document properties (56%) and fake headers when H1 and H2 have been created using pure emphasis instead of Word styles.
The first PDF to be indexed has been a document called Test 7 (PDF-test-without-header2-KD100.pdf). This document contains an H1 made using Word styles, a fake H2 – just emphasized text – with a KD of 100%. Just after some days, this document has been completely refused by Google SERP. Today is in the index but sit nowhere.
A snugly result for test number 5, 28% KD and one header, whilst no index at all for test 3, 10 or 13 for example.
If we would like to analyze only the first three results (first one and it’s aggregate) plus the first result shown when expanding hidden results we got the following picture.
Positive results has been collected for test number 12, always been present in the SERP and now stable on position 1 from about one week, test number 1.1, with some fluctuation, but now stable on position 2, and finally test 11, that apart some daily disappear has always been on position three.
So what makes the difference for these documents?
I almost sure Google is able to interpret the RTF code contained into PDF document (most probably doing a sort of reverse engineering). This sounds like strong assert (and maybe it is, so please take it just as my personal opinion) but it’s the only explanation I was able to find when I answered to the question “Why these?”
Analyzing the SERPs, I saw that after a KWD factor, the headers get their own importance., so, today what could be the answer about the following question?
If today I should frankly answer to this dilemma, I would probably got through with the following bulleted point:
I believe I don’t forget anything and I hope you enjoyed reading this post.
Posts a cui potresti essere interessato:Mi chiamo Andrea Moro, sono un appassionato di informatica da quando avevo 8 anni e da quando mio padre mi regalò il C64.
Qualche anno più tardi, il mio primo pc e nel 1994 la prima esperienza con Internet, di cui mi sono subito innamorato e con cui oggi mando avanti la mia attività di Web Designing e posizionamento nei motori di ricerca.
Lascia una risposta