Přejít na obsah | Přejít na navigaci

Osobní nástroje

Nacházíte se zde: Úvod / What's New / History of Downtimes

History of Downtimes

Full history of important announcements related to IT4I infrastructure, planned downtimes, outages etc.

Date and time Title and description
2016-06-29 13:50:00 Salomon cluster maintenance outage prolonged

Important! Salomon cluster maintenance outage will be prolonged till 2016-06-29 20:00 CEST.

2016-06-16 00:00:00 Salomon planned downtime

There's a planned maintenance window from 2016-06-28 09:00 till 2016-06-29 20:00 CEST.

Thank you for understanding,
the IT4Innovations team

2016-05-26 10:31:44 Salomon planned downtime

There's a planned maintenance window from 2016-06-08 09:00 till 2016-06-09 09:00 CEST.

Thank you for understanding,
the IT4Innovations team

2016-04-27 15:57:28 Salomon cluster maintenance outage prolonged

Important! Salomon cluster maintenance outage will be prolonged till 2016-04-28 14:00 CEST

2016-04-19 08:20:00 Salomon cluster maintenance outage

From 2016-04-26 09:00 till  2016-04-27 14:00  CEST will be maintenance outage for Salomon. We are going to reinstall compute nodes operating system to the latest CentOS 6.7.

2016-03-31 19:03:25 Failure on Salomon Cooling System

We have very serious issue with Salomon cooling system since 2016-03-31 10:00. We are working to resolve the issue.

2016-03-31 18:59:04 Salomon Back in Production

As of 2016-03-31 19:30 CET, the Salomon is back in production. The outage was caused by an issue in cooling system.

2016-03-30 15:57:57 PBS malfunction

We've had several issues with PBS scheduler since 2016-03-30 13:00 CEST. We are still working on it.

2016-03-26 09:52:41 Salomon back to production

We have recovered all the issues with the Salomon cluster.

2016-03-26 09:51:16 Failure on Salomon Cooling Infrastructure

We had an issue with the cooling infra of Salomon. This issue led to InfiniBand and storage outage. We are working to resolve the issue.

2016-03-14 14:57:51 Infrastructure Maintenance

The Salomon supercomputer will be down for the maintenance from 2016-03-22 10:00 CEST to 2016-03-22 17:00 CEST.

2016-02-24 12:30:00 Anselm Upgrade

The Anselm supercomputer will be down for the updates from 2016-02-01 to 2016-02-29.

2016-02-23 08:45:45 Anselm Upgrade

The Anselm supercomputer will be down for the updates from 2016-02-01 to 2016-02-29.

2016-02-14 08:07:46 Failure on Salomon Infiniband Network

We have very serious issue with Salomon Infiniband network since 2016-02-11 10:18. We are working to resolve the issue as quickly as possible and apologize for any inconvenience.

2016-02-04 11:53:30 Short network outage

We need to apply some changes in network devices settings which may cause short network outage for Anselm Login nodes. This work will start around 6 am 2016-01-26.

Thanks for your understanding.

2016-02-04 11:52:37 Salomon Upgrade

The Salomon supercomputer will be down for the updates from 2016-02-16 09:00 CEST to 2016-02-16 13:00 CEST.

2016-02-04 11:52:25 Salomon Upgrade

The Salomon supercomputer will be down for the updates from 2016-02-16 09:00 CEST to 2016-02-16 13:00 CEST.

2015-11-26 08:11:25 /home downtime

Dear HPC users

There's a /home downtime on the Salomon supercomputer planned for 25th November. The reason is a maintenance of the underlying CXFS filesystem. Your jobs will be scheduled with respect to this maintenance window.

Thank you for understanding,
the IT4I team

2015-11-24 09:10:15 The /home filesystem was down

On 23.11.2015, 13:55 - 14:55, the /home filesystem was down due to acute technical problems.
We apologize for inconvenience.

2015-09-04 16:14:12 SCRATCH downtime

Dear IT4I users

Salomon's SCRATCH will *not* be accessible on Thursday (10th September 2015) from 13:00 till 18:00 CEST.

Thank you for understanding,
the IT4Innovations team

2015-08-27 00:00:00 Today's SCRATCH downtime

Dear IT4I users

We are sorry for today's (27th August) inaccessibility of SCRATCH filesystem due to a broken service which normally provides mapping for user/group IDs (UIDs/GIDs). The issue has been fixed. No data were lost.

Thank you for understanding,
the IT4Innovations team

2015-08-12 00:00:00 Unplanned downtime

Dear Salomon users,

there was an unplanned downtime of the non-accelerated nodes. At this moment, systems are booting and we are revising consequences. Temporarily inaccessible SCRATCH filesystem is one of them.

We're sorry for the inconvenience,
the IT4I team

2015-08-06 00:00:00 SCRATCH downtime

Dear IT4I users

Salomon's SCRATCH will not be accessible tomorrow (7th August 2015) from 08:30 till 11:00 CEST.

Thank you for understanding,
the IT4Innovations team

2014-11-14 10:27:51 Unplanned PBS Downtime

Dear Anselm users,

we apologize for the unavailability of our PBS scheduler during the last weekend. However, running jobs shouldn't have been affected at that time.

Thank you for understanding,
Anselm Admins

2014-11-14 10:27:50 Login1 troubles

Login1 had a short unplanned downtime. Sorry for the troubles.

2014-10-14 20:30:00 Unexpected power failure

Dear Anselm users,


on Tuesday 14th approximately at 17:20 CEST we encountered power failure during service operation on backup diesel generator. The system shut down. Additional checks after the shutdown took more time than what would expect. The system was back on-line with all services approximately at 21:00 CEST. We are very sorry for any troubles, this matter may caused you.  If some of your jobs ended in incorrect state, please feel free to reclaim your core hours.


Thank you for understanding, 
Anselm Administrators

2014-07-17 13:50:00 Login2(!) downtime

Dear Anselm users,

there's an upgrade planned on Friday, 18th July from 13:00 till 16:00 CEST. Please, take in mind that login2.anselm.it4i.cz will be unavailable at the given time-frame. We are sorry for the inconvenience.

Thank you for understanding,
Anselm Admins

2014-07-16 13:11:34 Login1 downtime

Dear Anselm users,

there's an upgrade planned on Thursday, 17th July from 13:00 till 16:00 CEST. Please, take in mind that login1.anselm.it4i.cz will be unavailable at the given time-frame. We are sorry for the inconvenience.

Thank you for understanding,
Anselm Admins

2014-06-18 10:51:56 Login2 downtime

Dear Anselm users,

there's an upgrade planned on Wednesday, 18th June from 11:20 till 14:20 CEST. Please, take in mind that login2.anselm.it4i.cz will be unavailable at the given time-frame. We are sorry for the inconvenience.

Thank you for understanding,
Anselm Admins

2014-05-22 00:00:00 Outage

Dear Anselm users.

As of today (20140523 10:45) we had an unmanaged outage of a few nodes. Affected nodes were cn[117-126,193-195].

Sorry for the inconveniences,
Anselm admins

2014-04-11 11:30:00 Heartbleed bug

Dear users of the Anselm cluster,

A serious bug in the OpenSSL library, known as the "Heartbleed bug" has been recently discovered. We would like to ensure you that IT4I has taken all necessary steps to fix the OpenSSL library on all the systems.
The bug in OpenSSL library affected many sites worldwide for nearly two years. At this moment, there is no evidence that any abuse of data took place at IT4I.
In order to ensure the security and integrity of IT4I systems, all users will be issued new login credentials, including password and ssh keys.
For more informations about the Heartbleed bug, please see: https://docs.it4i.cz/heartbleed-bug

Thank you for your understanding.
IT4Innovations team

2014-04-02 13:05:00 Scheduler is Down

We are sorry for the current scheduler issues which are caused by an inconsistency of the internal PBS database. Thus it's not possible to interact with the scheduler now. In addition, some jobs may be affected and some job outputs may not be retrieved at this moment.

2014-03-26 15:50:00 Temporary Scratch Mount on Login1

Because of the Lustre issues (mentioned in previous announcements) there's a temporary mount point for Scratch filesystem on login1 node. Please, follow this path to access your data:

/scratch_nfs/

2014-03-26 13:10:00 Both Login Nodes Inaccessible

We are sorry for the inaccessibility of both login nodes. We are co-operating with our supplier and trying hard to solve this problem as soon as possible.

Thank you for understanding.

2014-03-25 22:05:00 Login1(!) Not Responding

Currently, if you're having trouble while accessing Anselm, please use address login2.anselm.it4i.cz instead of anselm.it4i.cz. There's a Lustre issue with the login1 node, which causes it to not respond.

We'll let you know by MOTD when login1 comes back online.

2013-12-03 00:00:00 Planned Downtime

On 17th December 08:00 to 18:00 CET Anselm will be down for maintenance. Power supply upgrade will take place, as well as system maintenance and software updates.

Prior to the period:
- Jobs will be scheduled for running with a respect to the downtime.

During the period:
- No Anselm HPC service will be available.
- Following web applications will not be accessible: Request Tracker, Anselm cluster documentation, Anselm Allocation.
- Submitting tickets through the e-mail address will be delayed.

After the period:
- All service will be brought back to normal.
- Jobs in a 'Q' state will be scheduled for running.

We are sorry for the inconvenience.

2013-10-14 00:00:00 Cooling system unstable

Dear Anselm users,

there was an unplanned downtime due to severe issues with the cold doors today. We are trying hard to bring all services up. We assume that we will finish the maintenance at about 13:30 CEST.

Thank you for understanding.

Sincerely yours,
Anselm admins

2013-09-17 15:50:00 A Fair Amount of Nodes Down

Dear Anselm users,

We had an outage on the Anselm Cluster. A fair amount of the nodes was unavailable to production.

Consider terminated job resubmission.

We are sorry for the troubles,
Anselm admins

2013-08-23 15:25:00 Infiniband Maintenance Window

Dear Anselm users,

We would like to inform you about a planned Infiniband maintenance window, on Wednesday, 28th August from 09:00 till 16:30 CEST.
No Anselm service will be available during this outage.

New batch jobs will not be scheduled for running during this time. Consider altering the job walltime, to achieve job execution prior to the downtime (See Job Submission in the Anselm Documentation).

Sincerely yours,
Anselm Admins

2013-08-14 17:10:00 Planned Upgrade / Scheduler Downtime

Dear Anselm users,

there's an upgrade planned on Thursday, 15th August from 18:00 till 22:00 CEST. Please, take in mind that PBS scheduler won't accept your jobs at the given timeframe. We are sorry for the inconvenience.

Thank you for understanding,
Anselm Admins

2013-07-16 13:45:00 Anselm Cluster Upgrade - July 23rd

Dear Anselm users.

We would like to inform you that Anselm cluster will be unavailable due to upgrades on Tuesday, July 23rd from 07:00 to 20:00 CEST.

Sorry for the inconvenience,
Anselm admins

--

Anselm cluster documentation can be found at:
http://support.it4i.cz/docs/anselm-cluster-documentation/

2013-06-27 09:35:00 SSH Password Authentication

Dear Anselm users.

If you are experiencing some troubles when using SSH PasswordAuthentication on the client side, please switch to PubkeyAuthentication instead.

We are trying hard to resolve this issue.

Sincerely yours,
Anselm admins