Tech SEO
Obviously, tech seo is something that helps you control how a search engine (Google) crawls and indexes your site. This includes the following issues:
Crawl/index/serving pipeline
Duplicate content & Canonicals
Blocked resources
Robots
Sitemaps
Multi-language websites: hreflang, etc
Migrating a site: redirects
Structured data
User Experience: Core Web Vitals report / PageSpeed Insights
Search appearance: article date / title links / search result snippets
Crawling process involves effective allocation of resources by Google as well as the ability to reach all pages on the site. Both issues are addressed if the SEO strategy starts from creating the efficient website architecture.
Website architecture
Historically, the best website architecture implied clear categories and pages structure or a Siloing as well as ability for the search engine to reach every page from by one or two links from the parent [page]. Obviously, it is not a trivial task to do.
It includes creating such a way to distribute the βlink juiceβ that all the important pages receive an adequate number of internal links. Since, the number of internal links is a sign of page importance as per website. By contrast, the presence of so-called βorphanedβ pages requires more resources to crawl a website and steals a PageRank (InRank) from other pages.
What we check on the site:
1. Using Screaming Frog SEO Spider or similar programmes, check the level of nesting of important pages. The most important pages should be located as close to the root of the site as possible.
2. Check the number of significant internal links to important pages and the variety of anchor text (using Screaming Frog SEO Spider or similar programmes).
Related cases studies:
Crawling
Crawling problems are also cured by creating effective sitemap(s), HTML sitemap (where relevant) as well as removing directives that block resources or pages from being crawled.
It is not uncommon that pages are being "Crawled but not indexed by Google". Apart from usual procedure reasons it might be caused by the pagesβ low quality (as perceived by Google) or overall siteβs trust.
It's possible to think of low quality content as the main reason of a problem of Crawling without Indexing.
Low quality content consists of 2 groups. The first is pages that should never be indexed. These include a variety of service pages, pages of search, filters in various sections, not optimised for indexing, etc. These pages are standard for each CMS and / or site and are found by analysing the site structure, robots.txt and robots meta tags on these pages. Getting rid of this poor quality content is easy enough. To begin with, we write on such pages meta tag robots (noindex,nofollow) or (noindex,follow) depending on the situation.
After the pages fall out of the index (about a month), write a prohibition in robots.txt.
The second type of low-quality content is that which is rejected by Google itself. Itβs especially important for YMYL-sites. It is about this content John Mueller, the official
Google spokesman responded to the question: "Does the presence of a low-quality section on a site affect the quality of the entire site?" To this, the Google spokesperson replied that it does.
Indexing
Not indexed
Server error (5xx)
Redirect error
URL blocked by robots.txt
URL marked βnoindexβ
Soft 404
Blocked due to unauthorised request (401)
Not found (404)
Blocked due to access forbidden (403)
Blocked by page removal tool
Crawled - currently not indexed
Discovered - currently not indexed
Alternate page with proper canonical tag
Duplicate without user-selected canonical
Duplicate, Google chose different canonical than user
Page with redirect
Duplicate content
Duplicate content is when the same page is given by another URL path. So, it means that it has been written to the database several times by different parameters or as a result of mistakes in implementation of canonicals.
Parameters in URL
Example: Often, use or parameters (such as ) in URL is a reason of duplicate content. Compare www.mydomain.com/news/all and www.mydomain.com/news/all?page=1
Personal Anecdote:
I believe this type of duplicate content makes a little problem to SEO. It's easily dealt with using robots.txt file and Google can normally deal with it herself, unless it came to numbers.
Internal Duplicate Content due to website templates
Also, duplicate content term relates to cases where the pages are different, but the content served on the page is [almost] identical to the content of another page. Once reason for this is heavily templated website structure with a low number of unique content, sometimes called βthinβ content pages.
Wrong implementation of canonicals
Canonical tags are often used incorrectly on websites. By adding a rel="canonical" element to a page, you tell search engines which version of the page should appear in search results.
When using 'canonical' tags, it is important to make sure that the URL that you specify in the rel="canonical" element leads to an existing page. canonical links to non-existent pages make it harder to crawl the site and index content, which leads to lower crawling efficiency and waste of a crawling budget.
Most common problem: programmatical instruction to mirror the URL in rel="canonical" tag.
Semantic markup
The HTML5 standard has provided new elements for structuring, grouping content and markup of textual content. The new semantic elements have improved the web page structure by adding meaning to the content they enclose.
The use of semantic elements is quite straightforward and simple. For example:
So, semantic mark-up consists of using 15 different tags that define the individual components in the page hierarchy. With this code structure you can explicitly tell Google the purpose and content of your individual page, separate MC from SC and adverts.
In any case, in order for Google to better understand the purpose, function, content and usefulness of theof a web page - implement and use semantic HTML.
When working on restoring traffic and getting out from under quality updates - check the correctness of your semantic markup, if any, and the correctness of its use.
Hreflang tags and international targeting
On multi-language sites, you can often see problems with incorrect language versions. The hreflang attribute (rel="alternate" hreflang="x") helps the search engine understand which page should be shown to visitors based on their location. This attribute should be used if you have a multilingual site and you want users from other countries to easily find content in their own language.
You should make sure that all links in the hreflang attributes are absolute URLs that return a 200 status code, otherwise search engines will not be able to interpret them correctly and as a result the wrong language version of the page will be shown to the relevant audience. language version of the page.
Errors with language versions can be different, but the main thing is that they are extremely common on multi-language sites, which requires mandatory checking
Checklist for usage of hreflangs
Structured data
As with the use of semantic HTML, structured data can be used to demonstrate the functions of web pages to the Googlebot. In addition, structured data is mandatory for the so-called "advanced" results of the renditions.
The use of structured data on YMYL sites is a must, as it solves many problems: to describe the brand of the site, to indicate the type of content, to improve the visibility in the output, including thanks to the zero position in the output. I believe that the minimum required structured data types are:
Naturally, the sets of structured data types depend on the type of site, and other data sets can be used.
Using micro markup, you can stand out well in the organic output, I especially recommend at the moment to apply FAQ both for information pages and for product cards. This micro markup allows you to increase the CTR in the output, which will affect the growth of the position in organic and increase traffic. positions in organics and increase traffic.
Here is an example of using FAQ micro markup:
Last updated