Tuesday, June 28, 2011

SCCM Child Primary site not replicating Update lists or deployments


We ran across an issue this weekend with one of our SCCM child primary sites not replicating deployments or software update lists correctly.  2 of 3 primary sites were replicating fine but one primary was not getting new items.  For this article, I will represent my central site as CEN and the non working primary site as PRI (imaginative, I know).

After several hours with the configuration manager logs, I had pretty much tracked it down to bad configuration items (patches) but I was stuck on what to do after trying various sync options, mostly detailed here:  http://blogcastrepository.com/blogs/brian_tucker/archive/2009/06/18/how-to-resolve-site-replication-issues-with-sccm-2007-sp1-r2.aspx.


I called MS support and he gave me a set up queries to help figure out what was going on.

At the end of the day, the problem can be seen in objreplmgr.log and you see sets of lines like:

Processing replication file C:\Program Files\Microsoft Configuration Manager\inboxes\objmgr.box\INCOMING\Retry\CEN_48964.CID in retry.                SMS_OBJECT_REPLICATION_MANAGER               6/27/2011 9:42:31 AM    5640 (0x1608)
Referenced configuration items are not available yet: http://schemas.microsoft.com/systemsmanagementserver/Site_A4F5CBC7-3D34-4DC1-A024-C496694647C5/SUM_914207c8-1e32-4e4e-acee-7ebdce4568b2/2(2)(6);                SMS_OBJECT_REPLICATION_MANAGER               6/27/2011 9:42:31 AM    5640 (0x1608)
Failed to insert Object e5c2a931-5c7c-4c0b-843a-343525f73e3d from replication file C:\Program Files\Microsoft Configuration Manager\inboxes\objmgr.box\INCOMING\Retry\CEN_48964.CID.                SMS_OBJECT_REPLICATION_MANAGER               6/27/2011 9:42:31 AM    5640 (0x1608)

Note the unique package name highlighted in red above.  This is the key string. 

We did do an ‘attempt 1’ that was unsuccessful but may be worth trying.  In this, we extract just the GUID of the CI and update the date in the CEN database and hope it replicates down.  Get the GUID from the highlighted string above as everything after the SUM_.  In our case it would be 914207c8-1e32-4e4e-acee-7ebdce4568b2.  In the query below, we have multiple CIs identified, you can put as many as you want in the query (gathered from different log lines).

Update ci_configurationitems
set datelastmodified = getdate()
where ci_uniqueid in (‘914207c8-1e32-4e4e-acee-7ebdce4568b2,' c4fca437-b931-48fc-a6bf-6cc7d0aefc2d’)

This didn’t do anything useful in our case so we continued on to ‘Attempt 2’.

If you take your whole string and do a select in your SCCM database on both the primary and the central sites:
SELECT * FROM ci_sdmpackages where SDMPackageName in (‘Site_A4F5CBC7-3D34-4DC1-A024-C496694647C5/SUM_914207c8-1e32-4e4e-acee-7ebdce4568b2’)

Take note of the isDeleted and the SDMPackageVersion columns.  For me, 95% of the items showing up in the logs were showing up as deleted in CEN but not in PRI.  This can be fixed by running an update like:
update ci_sdmpackages
   set Isdeleted = 1
   where SDMPackageName = ' Site_A4F5CBC7-3D34-4DC1-A024-C496694647C5/SUM_914207c8-1e32-4e4e-acee-7ebdce4568b2’ and SDMPackageVersion = '3'

Note that you will change the GUID and the SDMPackage version.  You want to update your PRI site to match CEN.

One your DB is updated, you can move your replication file as ID’d above (CEN_48964.CID in this case) from the objmgr.box\incoming\retry to objmgr.box\incoming.  Watch  your objreplmgr.log for success.

Now, that worked but I had 110 CIDs to work through and I heartily ascribe to larry wall’s 3 virtues.  in order to shorten my workload a bit, this is how I did it.  I didn’t have any good way to communicate between sites, unfortunately.  If I did, I would have done this much more directly.  I was on a bit of a deadline to get this up so I went for speed over elegance.

1.     Get all the guids.  I ran the following against the objreplmgr.log to pull out just my relevant configuration item IDs.  I ran this against a copy w/ just one run of ‘failures’ initially but we do sort out unique so you can probably just run it against the real thing.

$a = get-content .\objreplmgr.log | where {$_ -match "Referenced configuration items are not available"}
$ciList = @()
Foreach ($line in $a) {
   $line -match ".*(Site_.*/SUM_.*?)/.*"
   $ciList += $Matches[1]
}

2.     I tried to strip out duplicates.  On my first run this only saved 2 items, next run saved me much more.  If you run against the whole log, this is more important.
$ciListSorted = $ciList | sort –Unique

3.     Create my sql query:
$sqlQuery = "SELECT * FROM ci_sdmpackages where SDMPackageName in ("
for ($i=0; $i -lt $ciListSorted.length - 1; $i++) {
   $sqlQuery += "'$($ciListSorted[$i])', "
}
$sqlQuery += "'$($ciListSorted[$ciListSorted.Length-1])')"

4.     Run the sql query against both databases.  I have this function in my standard library that I stole some time ago:
function query-SQL {
   param(
         $DBInstance,
         $Datatabase,
         $Query)
  
   $SqlConnection = New-Object System.Data.SqlClient.SqlConnection
   $SqlConnection.ConnectionString = "Server = $DBInstance; Database =$Database; Integrated Security = True"
   $SqlCmd = New-Object System.Data.SqlClient.SqlCommand
   $SqlCmd.CommandText = $Query
   $SqlCmd.Connection = $SqlConnection
   $SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
   $SqlAdapter.SelectCommand = $SqlCmd
   $DataSet = New-Object System.Data.DataSet
   $SqlAdapter.Fill($DataSet)
   $SqlConnection.Close()
   return $DataSet.Tables
}

Run against my Primary database

$colPRI = query-SQL -Datatabase $database -DBInstance $dbinstance -Query $sqlQuery
$PRI = @()
Foreach ($line in $colPRI[1]) {
   $PRI += "PRI,$($line.isDeleted),$($line.SDMPackageName),$($line.SDMPackageVersion)"
}
$PRI > PRI.csv


For the CEN site, I manually copied the sql query over w/
$sqlQuery | Out-File sqlquery.txt

Created the text file on the other side and brought it back in and ran it like so:
$sqlquery = Get-Content .\sqlquery.txt

$colCEN = query-SQL -Datatabase $database -DBInstance $dbinstance -Query $sqlQuery
$CEN = @()    
Foreach ($line in $colCEN[1]) {
   $CEN += "CEN,$($line.isDeleted),$($line.SDMPackageName),$($line.SDMPackageVersion)"
}
$CEN > CEN.csv


Then, I brought my CEN.csv back over to PRI and brought it back to my powershell env.

$CEN = Get-Content .\cen.csv

Now we have $CEN and $PRI that are effectively .csv files in a variable.  Check your lengths, do some spot verification that they are the same length and have the same IDs in the same order, etc.  We did sort them above and everything should be fine and loop below does some checking but, well, defense in depth.  Did you remember to back up your databases?  Not my problem if not.


This loop spins through your two arrays and sets the PRI database to match what is in the CEN database.  Note the query-sql line, you will need $database and $DBInstance defined already.

for ($i = 0; $i -lt $PRI.length -1; $i++) {
   $site,$CenIsDel,$CenID,$CenVersion = $CEN[$i].split(",")
   $site,$PriIsDel,$PriID,$PriVersion = $PRI[$i].split(",")
   if ($CenIsDel -eq "True") { $CenIsDel =1 } else { $CenIsDel = 0}
   if ($PriIsDel -eq "True") { $PriIsDel =1 } else { $PriIsDel = 0}
   #Make sure our lines are the same
   if (-not $CenID -eq $PriID) {
         throw ("iteration $i, $CenID not equal to $PriID")
   }
  
   if (($PriIsDel -ne $CenIsDel) -or ($PriVersion -ne $CenVersion)) {
         $query = "update ci_sdmpackages set Isdeleted = $CenIsDel where SDMPackageName = '$CenID' and SDMPackageVersion = '$CenVersion'"
         "Setting $PriID to $CenisDel and $CenVersion"
         query-sql -database $database -dbinstance $dbinstance -query $query
   } else {
         "$PriId already matches central"
   }
}

After this, I dropped all the files from objmgr.box\incoming\retry back into objmgr.box\incoming.  Success!  Well, mostly.  I still had about 8 bad CIDs.  Looking in the log, some of them were due to lines that had two CIs mentioned. 
Referenced configuration items are not available yet: http://schemas.microsoft.com/systemsmanagementserver/Site_A4F5CBC7-3D34-4DC1-A024-C496694647C5/SUM_40d3d142-5ea7-463c-a7c1-9d34cdd5a7fc/1(1)(6);http://schemas.microsoft.com/systemsmanagementserver/Site_A4F5CBC7-3D34-4DC1-A024-C496694647C5/SUM_c4fca437-b931-48fc-a6bf-6cc7d0aefc2d/1(1)(6);

My regex only accounts for 1.  Rather than figure out a better regex, I ran through the steps above again I gathered the 3 CIs that I had that were specified as the second on a line.  Then I ran through the steps above and manually added my 3 CIs to the $ciList, when I was at that step.  Moved the \retry items back into the box and all were consumed successfully.  Gave it about 15 minutes and site replication is working again.  W00t!
               


analytics