Skip to content
Snippets Groups Projects
Commit c2d1f164 authored by Lukáš Krupčík's avatar Lukáš Krupčík
Browse files

Merge branch 'add_changes]' into 'master'

Add changes]

See merge request !48
parents 5718aed4 c483cc2f
No related branches found
No related tags found
5 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section,!48Add changes]
Pipeline #
File deleted
History of Downtimes
====================
Full history of important announcements related to IT4I infrastructure, planned downtimes, outages etc.
|Date and time |Title and description |
| --- | --- |
|2016-08-29 00:00:00 |**Salomon Maintenance** The Salomon supercomputer will be down for the updates from 2016-09-19 11:00 CEST to 2016-09-21 11:00 CEST. |
|2016-07-29 00:00:00 |**Anselm outage** There was an unplanned outage of Anselm cluster due to network problems. Anselm is now back in production. |
|2016-07-27 13:28:51 |**Back in Production** The Salomon and Anselm supercomputers are online! The flooding in the computer room caused by the failed coolant water pipe is now cleared, damage to infrastructure is virtually none.The 8th Open Access Call deadline is extended till Friday 2016-07-29 to compensate the inaccessibility of the extranet.it4i.cz portal. |
|2016-07-20 08:45:00 |**Anselm planned downtime** There's a planned maintenance window from 2016-08-16 07:00 till 2016-08-18 16:00 CEST. |
|2016-06-29 13:50:00 |**Salomon cluster maintenance outage prolonged** Important! Salomon cluster maintenance outage will be prolonged till 2016-06-29 20:00 CEST. |
|2016-06-16 00:00:00 |**Salomon planned downtime** There's a planned maintenance window from 2016-06-28 09:00 till 2016-06-29 20:00 CEST.Thank you for understanding,the IT4Innovations team |
|2016-05-26 10:31:44 |**Salomon planned downtime** There's a planned maintenance window from 2016-06-08 09:00 till 2016-06-09 09:00 CEST.Thank you for understanding,the IT4Innovations team |
|2016-04-27 15:57:28 |**Salomon cluster maintenance outage prolonged** Important! Salomon cluster maintenance outage will be prolonged till 2016-04-28 14:00 CEST |
|2016-03-31 19:03:25 |**Failure on Salomon Cooling System** We have very serious issue with Salomon cooling system since 2016-03-31 10:00. We are working to resolve the issue. |
|2016-03-31 18:59:04 |**Salomon Back in Production** As of 2016-03-31 19:30 CET, the Salomon is back in production. The outage was caused by an issue in cooling system. |
|2016-03-30 15:57:57 |**PBS malfunction** We've had several issues with PBS scheduler since 2016-03-30 13:00 CEST. We are still working on it. |
|2016-03-26 09:52:41 |**Salomon back to production** We have recovered all the issues with the Salomon cluster. |
|2016-03-26 09:51:16 |**Failure on Salomon Cooling Infrastructure** We had an issue with the cooling infra of Salomon. This issue led to InfiniBand and storage outage. We are working to resolve the issue. |
|2016-03-14 14:57:51 |**Infrastructure Maintenance** The Salomon supercomputer will be down for the maintenance from 2016-03-22 10:00 CEST to 2016-03-22 17:00 CEST. |
|2016-02-24 12:30:00 |**Anselm Upgrade** The Anselm supercomputer will be down for the updates from 2016-02-01 to 2016-02-29. |
|2016-02-23 08:45:45 |**Anselm Upgrade** The Anselm supercomputer will be down for the updates from 2016-02-01 to 2016-02-29. |
|2016-02-14 08:07:46 |**Failure on Salomon Infiniband Network** We have very serious issue with Salomon Infiniband network since 2016-02-11 10:18. We are working to resolve the issue as quickly as possible and apologize for any inconvenience. |
|2016-02-04 11:53:30 |**Short network outage** We need to apply some changes in network devices settings which may cause short network outage for Anselm Login nodes. This work will start around 6 am 2016-01-26.Thanks for your understanding. |
|2016-02-04 11:52:37 |**Salomon Upgrade** The Salomon supercomputer will be down for the updates from 2016-02-16 09:00 CEST to 2016-02-16 13:00 CEST. |
|2016-02-04 11:52:25 |**Salomon Upgrade** The Salomon supercomputer will be down for the updates from 2016-02-16 09:00 CEST to 2016-02-16 13:00 CEST. |
|2015-11-26 08:11:25 |**/home downtime** Dear HPC usersThere's a /home downtime on the Salomon supercomputer planned for 25th November. The reason is a maintenance of the underlying CXFS filesystem. Your jobs will be scheduled with respect to this maintenance window.Thank you for understanding,the IT4I team |
|2015-11-24 09:10:15 |**The /home filesystem was down** On 23.11.2015, 13:55 - 14:55, the /home filesystem was down due to acute technical problems.>We apologize for inconvenience. |
|2015-09-04 16:14:12 |**SCRATCH downtime** Dear IT4I usersSalomon's SCRATCH will *not* be accessible on Thursday (10th September 2015) from 13:00 till 18:00 CEST.Thank you for understanding,the IT4Innovations team |
|2015-08-27 00:00:00 |**Today's SCRATCH downtime** Dear IT4I usersWe are sorry for today's (27th August) inaccessibility of SCRATCH filesystem due to a broken service which normally provides mapping for user/group IDs (UIDs/GIDs). The issue has been fixed. No data were lost.Thank you for understanding,the IT4Innovations team |
|2015-08-12 00:00:00 |**Unplanned downtime** Dear Salomon users,there was an unplanned downtime of the non-accelerated nodes. At this moment, systems are booting and we are revising consequences. Temporarily inaccessible SCRATCH filesystem is one of them.We're sorry for the inconvenience,the IT4I team |
|2015-08-06 00:00:00 |**SCRATCH downtime** Dear IT4I usersSalomon's SCRATCH will not be accessible tomorrow (7th August 2015) from 08:30 till 11:00 CEST.Thank you for understanding,the IT4Innovations team |
|2014-11-14 10:27:51 |**Unplanned PBS Downtime** Dear Anselm users,we apologize for the unavailability of our PBS scheduler during the last weekend. However, running jobs shouldn't have been affected at that time.Thank you for understanding,Anselm Admins |
|2014-11-14 10:27:50 |**Login1 troubles** Login1 had a short unplanned downtime. Sorry for the troubles. |
|2014-10-14 20:30:00 |**Unexpected power failure** Dear Anselm users,>>on Tuesday 14th approximately at 17:20 CEST we encountered power failure during service operation on backup diesel generator. The system shut down. Additional checks after the shutdown took more time than what would expect. The system was back on-line with all services approximately at 21:00 CEST. We are very sorry for any troubles, this matter may caused you. If some of your jobs ended in incorrect state, please feel free to reclaim your core hours.>>Thank you for understanding, Anselm Administrators |
|2014-07-17 13:50:00 |**Login2(!) downtime** Dear Anselm users,there's an upgrade planned on Friday, 18th July from 13:00 till 16:00 CEST. Please, take in mind that login2.anselm.it4i.cz will be unavailable at the given time-frame. We are sorry for the inconvenience.Thank you for understanding,Anselm Admins |
|2014-07-16 13:11:34 |**Login1 downtime** Dear Anselm users,there's an upgrade planned on Thursday, 17th July from 13:00 till 16:00 CEST. Please, take in mind that login1.anselm.it4i.cz will be unavailable at the given time-frame. We are sorry for the inconvenience.Thank you for understanding,Anselm Admins |
|2014-06-18 10:51:56 |**Login2 downtime** Dear Anselm users,there's an upgrade planned on Wednesday, 18th June from 11:20 till 14:20 CEST. Please, take in mind that login2.anselm.it4i.cz will be unavailable at the given time-frame. We are sorry for the inconvenience.Thank you for understanding,Anselm Admins |
|2014-05-22 00:00:00 |**Outage** Dear Anselm users.As of today (20140523 10:45) we had an unmanaged outage of a few nodes. Affected nodes were cn[117-126,193-195].Sorry for the inconveniences,Anselm admins |
|2014-04-11 11:30:00 |**Heartbleed bug** Dear users of the Anselm cluster,A serious bug in the OpenSSL library, known as the "Heartbleed bug" has been recently discovered. >We would like to ensure you that IT4I has taken all necessary steps to fix the OpenSSL library on all the systems.The bug in OpenSSL library affected many sites worldwide for nearly two years. At this moment, there is no evidence that any abuse of data took place at IT4I.In order to ensure the security and integrity of IT4I systems, all users will be issued new login credentials, including password and ssh keys. For more informations about the Heartbleed bug, please see: [https://docs.it4i.cz/heartbleed-bug](https://docs.it4i.cz/heartbleed-bug) Thank you for your understanding.IT4Innovations team |
|2014-04-02 13:05:00 |**Scheduler is Down** We are sorry for the current scheduler issues which are caused by an inconsistency of the internal PBS database. Thus it's not possible to interact with the scheduler now. In addition, some jobs may be affected and some job outputs may not be retrieved at this moment. |
|2014-03-26 15:50:00 |**Temporary Scratch Mount on Login1** Because of the Lustre issues (mentioned in previous announcements) there's a temporary mount point for Scratch filesystem on login1 node. Please, follow this path to access your data:/scratch_nfs/ |
|2014-03-26 13:10:00 |**Both Login Nodes Inaccessible** We are sorry for the inaccessibility of both login nodes. We are co-operating with our supplier and trying hard to solve this problem as soon as possible.Thank you for understanding. |
|2014-03-25 22:05:00 |**Login1(!) Not Responding** Currently, if you're having trouble while accessing Anselm, please use address login2.anselm.it4i.cz instead of anselm.it4i.cz. There's a Lustre issue with the login1 node, which causes it to not respond.We'll let you know by MOTD when login1 comes back online. |
|2013-12-03 00:00:00 |**Planned Downtime** On 17th December 08:00 to 18:00 CET Anselm will be down for maintenance. Power supply upgrade will take place, as well as system maintenance and software updates.Prior to the period:- Jobs will be scheduled for running with a respect to the downtime.During the period:- No Anselm HPC service will be available.- Following web applications will not be accessible: Request Tracker, Anselm cluster documentation, Anselm Allocation.- Submitting tickets through the e-mail address will be delayed.After the period:- All service will be brought back to normal.- Jobs in a 'Q' state will be scheduled for running.We are sorry for the inconvenience. |
|2013-10-14 00:00:00 |**Cooling system unstable** Dear Anselm users,there was an unplanned downtime due to severe issues with the cold doors today. We are trying hard to bring all services up. We assume that we will finish the maintenance at about 13:30 CEST.Thank you for understanding.Sincerely yours,Anselm admins |
|2013-09-17 15:50:00 |**A Fair Amount of Nodes Down** Dear Anselm users,We had an outage on the Anselm Cluster. A fair amount of the nodes was unavailable to production.Consider terminated job resubmission.We are sorry for the troubles,Anselm admins |
|2013-08-23 15:25:00 |**Infiniband Maintenance Window** Dear Anselm users,We would like to inform you about a planned Infiniband maintenance window, on Wednesday, 28th August from 09:00 till 16:30 CEST.No Anselm service will be available during this outage.New batch jobs will not be scheduled for running during this time. Consider altering the job walltime, to achieve job execution prior to the downtime (See Job Submission in the Anselm Documentation).Sincerely yours,Anselm Admins |
|2013-08-14 17:10:00 |**Planned Upgrade / Scheduler Downtime** Dear Anselm users,there's an upgrade planned on Thursday, 15th August from 18:00 till 22:00 CEST. Please, take in mind that PBS scheduler won't accept your jobs at the given timeframe. We are sorry for the inconvenience.Thank you for understanding,Anselm Admins |
|2013-07-16 13:45:00 |**Anselm Cluster Upgrade - July 23rd** Dear Anselm users.We would like to inform you that Anselm cluster will be unavailable due to upgrades on Tuesday, July 23rd from 07:00 to 20:00 CEST.Sorry for the inconvenience,Anselm admins--Anselm cluster documentation can be found at:[http://support.it4i.cz/docs/anselm-cluster-documentation/"](http://support.it4i.cz/docs/anselm-cluster-documentation/) |
|2013-06-27 09:35:00 |**SSH Password Authentication** Dear Anselm users.If you are experiencing some troubles when using SSH PasswordAuthentication on the client side, please switch to PubkeyAuthentication instead.We are trying hard to resolve this issue.Sincerely yours,Anselm admins |
\ No newline at end of file
,kru0052,kru0052,24.01.2017 06:58,file:///home/kru0052/.config/libreoffice/4;
\ No newline at end of file
File deleted
File deleted
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment